Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Metrics-drivenEngineering at Etsy        MIKE BRITTAIN   mike@etsy.com @mikebrittain
Logs, Graphs, Trends,  and Correlations
Making Decisions
How many visitors are  using this thing?
Can we deploy that to100% of our visitors?
Did we make it faster?
Did I just break  something?
Q. Who makes the graphs?A. Well, the Ops team manages the network, racksthe servers, installed the monitoring tools, wears...
(but...) Engineers build   the application.
Dev + Ops
Access
Yes   No
“Engineers are too busy meeting our product      deadlines.”
Here’s the big secret...
Cacti (network, SNMP)Ganglia (machines)Graphite (application)Splunk (log analysis, nightly reports)Nagios (alerting)
Logging
Logger::log_error("User login   failed. Reason: $msg for     $username", “login”);
web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed.   Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed.   Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed.   Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed.   Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed.   Reason: wrong password for ...
Logster
Forked from ganglia-logtailer...- Daemon mode (only cron mode)+ Support for Graphite+ Simplified parsing scripts
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.web0001   [04:28:54   2011]   [error] [client 10...
Fatals   Errors   Warnings
StatsD
StatsD::increment("logins.success");StatsD::timing("gearman.time", $msec);
90th pct                             average                             lowerStatsD::timing("gearman.time", $msec);
Ad hocname value timestampn
echo "events.deploy.site 1 `date +%s`"      | nc graphite.etsycorp.com 2003
Trends + Eventstarget=drawAsInfinite(events.deploy.site)
What Happened?
16,000 metrics in Graphite     (plus 32,000 metrics in Ganglia)
Dashboards
Mix & MatchDashboards
Hard<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or+Script+Not+Found&yMin=0&t...
Easy$g = new Graphite($time);$g->setTitle(File Not Found);$g->addMetric(webs.errorLog.notExist, #00cc00);$g->showDeploys(t...
20 dashboards by  25 engineers
Application healthcorrelated with events
High-level visibility
Low MTTD
Validation
Confidence
codeascraft.etsy.comgithub.com/etsy/statsdgithub.com/etsy/logsterbitbucket.org/maplebed/ganglia-logtailer
Q&ADoes this sound like fun? Get in touch with us.      chad@etsy.com kellan@etsy.com     kastner@etsy.com mike@etsy.com
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Metrics-Driven Engineering
Next
Upcoming SlideShare
Metrics-Driven Engineering
Next
Download to read offline and view in fullscreen.

Share

Metrics-Driven Engineering at Etsy

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Metrics-Driven Engineering at Etsy

  1. 1. Metrics-drivenEngineering at Etsy MIKE BRITTAIN mike@etsy.com @mikebrittain
  2. 2. Logs, Graphs, Trends, and Correlations
  3. 3. Making Decisions
  4. 4. How many visitors are using this thing?
  5. 5. Can we deploy that to100% of our visitors?
  6. 6. Did we make it faster?
  7. 7. Did I just break something?
  8. 8. Q. Who makes the graphs?A. Well, the Ops team manages the network, racksthe servers, installed the monitoring tools, wears the pagers, blah, blah, blah...
  9. 9. (but...) Engineers build the application.
  10. 10. Dev + Ops
  11. 11. Access
  12. 12. Yes No
  13. 13. “Engineers are too busy meeting our product deadlines.”
  14. 14. Here’s the big secret...
  15. 15. Cacti (network, SNMP)Ganglia (machines)Graphite (application)Splunk (log analysis, nightly reports)Nagios (alerting)
  16. 16. Logging
  17. 17. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  18. 18. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  19. 19. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  20. 20. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  21. 21. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  22. 22. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  23. 23. Logster
  24. 24. Forked from ganglia-logtailer...- Daemon mode (only cron mode)+ Support for Graphite+ Simplified parsing scripts
  25. 25. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue.web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue.web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
  26. 26. Fatals Errors Warnings
  27. 27. StatsD
  28. 28. StatsD::increment("logins.success");StatsD::timing("gearman.time", $msec);
  29. 29. 90th pct average lowerStatsD::timing("gearman.time", $msec);
  30. 30. Ad hocname value timestampn
  31. 31. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
  32. 32. Trends + Eventstarget=drawAsInfinite(events.deploy.site)
  33. 33. What Happened?
  34. 34. 16,000 metrics in Graphite (plus 32,000 metrics in Ganglia)
  35. 35. Dashboards
  36. 36. Mix & MatchDashboards
  37. 37. Hard<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render?from=-1hours&width=280&height=220&title=File+or+Script+Not+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"></a>
  38. 38. Easy$g = new Graphite($time);$g->setTitle(File Not Found);$g->addMetric(webs.errorLog.notExist, #00cc00);$g->showDeploys(true);echo $g->getDashboardHTML(280, 220);
  39. 39. 20 dashboards by 25 engineers
  40. 40. Application healthcorrelated with events
  41. 41. High-level visibility
  42. 42. Low MTTD
  43. 43. Validation
  44. 44. Confidence
  45. 45. codeascraft.etsy.comgithub.com/etsy/statsdgithub.com/etsy/logsterbitbucket.org/maplebed/ganglia-logtailer
  46. 46. Q&ADoes this sound like fun? Get in touch with us. chad@etsy.com kellan@etsy.com kastner@etsy.com mike@etsy.com
  • StephenHall147

    Jun. 15, 2021
  • tszymczyszyn

    Aug. 31, 2016
  • mxmind

    Apr. 8, 2016
  • AkshatKansal

    Feb. 8, 2016
  • gromozzzeka

    Dec. 21, 2015
  • JasonNguyen12

    Nov. 17, 2015
  • SujanAkella1

    Nov. 6, 2015
  • salild

    Oct. 5, 2015
  • justinschmidt802

    Sep. 19, 2015
  • 25thhour

    Jul. 18, 2015
  • up1

    Jul. 15, 2015
  • summitsuen

    Jul. 13, 2015
  • ChunChuanSu

    Jul. 13, 2015
  • alexeyshockov

    Apr. 13, 2015
  • benjaminhu

    Mar. 7, 2015
  • hkokko

    Nov. 4, 2014
  • valid00

    Jul. 30, 2014
  • mdkknd

    Jul. 15, 2014
  • seeps001

    Jul. 11, 2014
  • yoophi

    Jul. 7, 2014

Views

Total views

20,371

On Slideshare

0

From embeds

0

Number of embeds

6,179

Actions

Downloads

268

Shares

0

Comments

0

Likes

63

×