Growing to become one of the largest sites on the Internet comes with a unique set of problems. Learning how to and adopt, and doing so without losing sight of content creator's voice proves tricky. This talk details some of the frontend tools we've built and approaches we've taken to service our millions of users at scale.
10. How does it work?
→ 1000’s of servers
→ Deploy dozens of times per day
→ Monitor and measure everything
→ Hadoop
→ OpenTSDB (backed by HBase)
11. Our process
→ Teams are small
→ Iterate quickly
→ Release early and often, usually to % of users
→ 2 code review “ok’s” required for all Pull Requests
15. Feature Flagging
Usage
→ Provides
→ A/B testing
→ Run beta code alongside production code
→ Kill switch
16. Feature Flagging
A/B Testing
→ Injected recommendations
→ A/B(/*) testing of
positioning
→ Which position is the
best? Why?
17. Feature Flagging
A/B Test Results
→ Injected recommendations
→ A/B(/*) testing of
positioning
→ Which position is the
best? Why?
Position 2
Position 3
Position 4
Position 5
Position 6
Position 7
Position 8
Position 9
18. Feature Flagging
Ramping & Kill Switch
→ Ramping new features
→ Deploy to only “admin” (staff)
→ …then 1% of users… then 5%… 10%… 25%…
→ Kill switch
→ Completely turn off a feature that’s breaking the site… poof
19. Feature Flagging
Use Carefully
→ Feature flagging certain functionality can give a mixed
experience
→ Can cause user confusion:
→ “Why does my mom see this and I don’t?”
— Confused teenager
→ Easy to build complex dependencies — don’t
27. Error Logging
Capture Errors
→ What you do with the logs doesn’t matter; it’s how you use it
→ We log errors to Scribe…
→ …throw them into Hadoop
→ …and count frequency with OpenTSDB
28. Error Logging
Error Data
→ With Hive, we can query Hadoop:
→ With this, I can see we log around 1.4 million errors per day