At Tuenti, we do 3 code pushes per week, sometimes modifying thousands of files and running thousands of automated tests and build operations before, to ensure not only that the code works but also that proper localization is applied, bundles are generated and files get deployed to hundreds of servers as fast and reliable as possible.
We use opensource tools like Mercurial, MySQL, Jenkins, Selenium, PHPUnit and Rsync among our own in-house ones, and have different development, testing, staging and production environments.
We had to fight with problems like statics bundling and versioning, syntax errors and of course the fact that we have +100 engineers working on the codebase, merging and releasing more than a 15 branches the same day. We also switched from Subversion to Mercurial to obtain more flexibility and faster branching operations.
With this talk we will explain the process of how code changes in ourcode repository end up in live code, detailing some practices and tips that we apply, problems we had and how we solved them.
5. Release Workflow: Branch
Branch Code Test Integrate Release Stabilize
• Avg. 15 branches per release
• Current record: 29 branches
• Repository per functional area (be, fe, stats, …)
• Avg. lines modified per release: 63K
6. Release Workflow: Code + Test
Branch Code Test Integrate Release Stabilize
• Scrum (or at least Agile)
• As TDD as possible
• Labs
• A/B Testing
• PoCs
• Dark launch
7. Release Workflow: Integrate
Branch Code Test Integrate Release Stabilize
• Repo always available
• Specific release date given by devops
– Merge & wait for target
• Only merge if 100% tests ok or specific approval
• QA Regression & manual tests
• Fix possible integration problems ASAP
8. Release Workflow: Release
Branch Code Test Integrate Release Stabilize
• 3 releases per week
– DevOps goal: All weekdays
• Latest stable changeset from Integration taken
previous working day morning
• Release doc, pre-release meetings
• Staging servers to test with live data
9. Release Workflow: Stabilize
Branch Code Test Integrate Release Stabilize
• First code push: 8 AM
– DevOps Goal: single push + release closed
• Release window: 1-2 h
– DevOps goal: < 30 minutes
• Error stabilization or release rollback
• Representatives from all involved teams
11. DVCS: Mercurial
• http://mercurial.selenic.com/
• Syntax similar to SVN (our old system)
• Easy API to plug our plugins and hooks
• Cross-platform
• Tuenti Addons:
– Commit hooks to check syntax, push ticket #...
• Problems:
– Push/pulls through VPN are slow
– Handling multiple repos still slow
– Only one level of rollback!
12. Issue Tracking: Trac
• http://trac.edgewall.org/
• User Stories tasks + Bugs
• Wiki (now also internal Google Sites)
• Plugins and extensible
• Tuenti Addons:
– Master/Slave architecture
– Tons of tweaks and source code integration hooks
• Problems:
– Slow, limited, code viewing sucks
• Migration to JIRA planned
13. Testing: PHPUnit
• http://www.phpunit.de
• Some caveats
– Mocking just „works‟
– PHP process spawning PHP tests
• Tuenti Addons:
– Vastly improved mocking framework
– Shell scripts that isolate test batteries
– Better integration with Selenium
• Problems:
– Our current FEFW does not cope perfectly with
PHPUnit/Selenium
14. Testing: Selenium
• http://seleniumhq.org/
• Running browser tests in FFox and IE
• Tuenti Addons:
– Custom build with some fixes
• Problems:
– Javascript handling/detection not perfect
– AJAX far from optimal
– IE runner is an iframe
• Planned migration to Webdriver
15. CI: Jenkins
• http://jenkins-ci.org/
• Previously Hudson too
• Specialized farm (master + 22 nodes)
• Tuenti Addons:
• Parallelization (up to 6 nodes)
• Special reports
• “Smart” runs (try first last failed tests, etc.)
• Problems:
• Browser tests slow (due to Selenium)
• Unstable (mainly due to Selenium)
16. Storage: MySQL
• http://www.mysql.com/ | http://www.percona.com
• Live site storage
• Dev. env. storage
– 1 DB per user (to run tests)
– 1 shared DB (common faked data)
• Clusters of master/slave DBs
• Problems:
– Slow when running tests
– Shared dev DB has old-time inconsistencies
17. Storage: Hadoop
• http://hadoop.apache.org/
• Dedicated cluster
• Pig scripts: Stats, other non-realtime data
• HBase: Async. data storage
• Hive: SQL-like querying
• Problems:
– Complex configuration for newcomers
18. Caching: Memcached
• http://memcached.org/
• Avg. DB querys/pageview: 0.3
• Dev. Behaviour == live behaviour
• Tuenti Addons (https://github.com/tuenti):
– UDP + multi-ports
• Problems:
– 32GB RAM / machine practical limit
– Remember to warm-up data or MC will kill the DB!
19. Configuration: Puppet
• http://puppetlabs.com/
• Production machines
• Jenkins nodes
• VM management / Dev web servers config
• Problems:
– Wipes user config if not puppetized
20. Search: Sphinx
• http://sphinxsearch.com/
• Non-realtime (index based)
• Very fast
• Problems
– Index re-generation on dev & test env.
– Could be more friendly to add new data
21. Build: Our build script
• http://ant.apache.org/
• Localization
• Minification + Bundling + Versioning
• Statics deployment to CDNs
• Fast: 2-3 minutes full build
– Multithreading + parallelization
• Allows partial, component based builds
• Problems:
– Under heavy CPU load, build time goes up :(
22. Build: RSync
• http://rsync.samba.org/
• Deployment of code (live & dev)
• Sends deltas/diffs
• Really fast
23. Statics bundling: YUI
• Less files == faster download & deploy
• Big text file == better HTTP Gzip
• HTML/JS/CSS Minification
• Ultra fast build: ~4 seconds in Dev.
• On demand JS loading!
• Nice typical framework features
• Wonderful & simple events system
24. Statics bundling: YUI (II)
• Tuenti Addons:
– Caching builds
– Line breaks each # characters (easier IE debugging)
– CDNs handling
• Problems:
– Change in JS requires rebuilding even in dev.
– Requires small migrations/changes in existing JS
25. Chat Server: Ejabberd
• http://www.ejabberd.im/
• Erlang XMPP (Jabber) server
• 400M msgs/day, 1M concurrent users peak,…
• 20 machines, ~5 instances per machine
• Tuenti Addons:
– Ejjabberd codebase tweaked (3,5x faster)
– Protocol tweaks to optimize for our architecture
• Problems:
– Same behaviour dev/live is critical
26. Compilation: HipHop
• Migrating old code to fully support HipHop
– With PHP 5.3
• Obvious speed improvements
• Also nice for static code analysis
TDD: Backend nearer, FE hard once you enter visual tests (acceptance)
Monday: Too much trafficFriday: Weekend next day, safer not to just in case something happens. (First redesign story)
Shared Gdocs spreadsheet in which QA add bugs and engineers check and mark
We use Singletons. PHP running PHP and thus keeping things between test batteries is the problem.We’re working on adding more testeability features to the FEFW
Yes, we use Singletons. the problem is PHP running PHP and thus keeping things between test batteriesWe’re working on adding more testeability features to the FEFW
We use XEN (http://xen.org/) for Windows virtualization on Jenkins buildsNodes are not virtualized because 20% less performance
We use the Percona server instead of vanilla MySQLMock everything (unit/integration)Reuse data if possible (browser)
Recommendation: Build one cluster, experiment, learn, use it. Then build another one better tunned with whatever you learned
Explain the 0,3 WTF
Indexed in [5,15] min normal scenario, worst case 1h max/limit
Old was a SH + PHP non-multithreaded script
JS & CSS
JS & CSS
Solution: easy chat server tutorial. Now all Comms team engineers have their own VM chat server setupNow resharded to 2 chat clusters with LB and 12 machines each (thanks to improvements)