Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Centralized team in a decentralized world: Engineering tools at Netflix
Next

12

Share

Beyond DevOps: How Netflix Bridges the Gap?

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1mv6Kpr.

Josh Evans uses the Netflix Operations Engineering as a case study to explore the challenges faced by centralized engineering teams and approaches to addressing those challenges. Filmed at qconsf.com.

Josh Evans is Director of Operations Engineering at Netflix, with experience in e-commerce, playback control services, infrastructure, tools, testing, and operations.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Beyond DevOps: How Netflix Bridges the Gap?

  1. 1. Josh Evans - Director of Operations Engineering November 16, 2015 Beyond DevOps: How Netflix Bridges the Gap
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /netflix-operations-devops
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. Technical Debt • Java 6 • Perforce • Single Master Jenkins • Ant • CentOS • Asgard/Mimir Fall 2013
  5. 5. How do we drive broad-based change?
  6. 6. The Paved Road • Java 7 • Stash • Jenkins Shards • Gradle • Ubuntu
  7. 7. Some said • You’re overloading us • Too many projects • Poor targeting Others said • What took you so long? • We’ve moved on • Now we need to migrate That’s great but… We’re paying a high tax
  8. 8. • Expectations gap – Division of labor – Timing of solutions – Leadership • Affects – Reputation – Relationships – Lost opportunities Organizational Debt
  9. 9. How do we bridge the gap?
  10. 10. “Remember that TIME is money…”
  11. 11. Time is a form of currency
  12. 12. • Product Engineering • Operations Engineering • Challenges & Strategies Our time today…
  13. 13. • Product Engineering • Operations Engineering • Challenges & Strategies Our time today…
  14. 14. Product Innovation winning moments of truth
  15. 15. ● Every facet of the product ● 1400 AB tests in the last year & accelerating Continuous Innovation
  16. 16. But wait, there’s more…
  17. 17. Build It • design • code • build • bake • test • deploy Run It • configure • monitor • triage • fix …at scale, globally You build it, you run it
  18. 18. Internet • 1000s of starts per second • 100,000s of requests per second • 100,000,000 hours of content / day • 3 AWS Regions, 3 AZs per region
  19. 19. Relentless product innovation Building & running micro- services at scale, globally
  20. 20. • Product Engineering • Operations Engineering • Challenges & Strategies Our time today…
  21. 21. DevOps is a software development method that emphasizes the roles of both software developers and other information-technology (IT) professionals with an emphasis on IT Operations. - Wikipedia The Gap
  22. 22. Why? How?
  23. 23. Quality Velocity Operational Excellence
  24. 24. Operational Excellence is the continuous improvement of the management, design, and function of operational environments to achieve greater quality, velocity, and competitive advantage.
  25. 25. • Engineering Tools • Insight & Real-time Analytics • Performance & Reliability Operations Engineering is the application of software engineering practices to achieve and sustain operational excellence.
  26. 26. Operations Engineering • Service provider • Operational excellence driver • Cross-cutting solutions • Undifferentiated heavy lifting
  27. 27. • Product Engineering • Operations Engineering • Challenges & Strategies Our time today…
  28. 28. • You’re overloading us • What took you so long? Remember that feedback? • We made assumptions – Requirements – what & when – Time for non-product work
  29. 29. • Move from assumptions to knowledge • Affect change without imposing a tax? • Achieve and sustain operational excellence? How do we…
  30. 30. Time is a form of currency
  31. 31. 5 strategies for success in time-based economies software & organizational engineering
  32. 32. 1. Reach out
  33. 33. • What are your biggest operational pain points? • How can we help? • How well are we meeting your needs today? • What would you like to see from us in the future? Listen Shower, rinse, repeat Talk to your engineering customers
  34. 34. Grease the Squeaky Wheels • low tolerance for tax • more vocal than most
  35. 35. • High impact solutions • Clarity on deliverables • Lower operational tax • Leadership, innovation, and partnership What they wanted
  36. 36. • Deliver on solutions • Better road map definition & communication • A more aggressive stance on automation • Deeper investment into leadership, innovation, planning Our commitments
  37. 37. 2. Make an impact • Apply what you’ve learned • Deliver what matters
  38. 38. • global cloud console • end to end delivery • automation platform • velocity with confidence
  39. 39. Pipelines - Automated Global Delivery
  40. 40. 3. Make it easy to do the right thing
  41. 41. • Engineering time is scarce • We must do more heavy lifting Supply & Demand
  42. 42. • Spinnaker manual step • Automated migrations – Mimir Provide on-ramps
  43. 43. Automate proven practices
  44. 44. • Alerting and Monitoring • Apache & Tomcat Hardening • Automated Canary Analysis • Autoscaling • Chaos Participation • Consistent Naming • ELB Configuration • Healthcheck Configured • Red-Black Pipeline • Squeeze Testing • Timeout & Fallback Tuning • Workload Reliability Production Ready?
  45. 45. • Alerting and Monitoring • Apache & Tomcat Hardening • Automated Canary Analysis • Autoscaling • Chaos Participation • Consistent Naming • ELB Configuration • Healthcheck Configured • Red-Black Pipeline • Squeeze Testing • Timeout & Fallback Tuning • Workload Reliability Production Ready?
  46. 46. Old Version (v1.0) New Version (v1.1) Load BalancerCustomers 100 Servers 5 Servers 95% 5% Metrics Canaries
  47. 47. Old Version (v1.0) New Version (v1.1) Load BalancerCustomers 0 Servers 100 Servers 100% Metrics Canaries
  48. 48. Define • Metrics • A threshold Every n minutes ● Classify metrics ● Compute score ● Make a decision Automated Canary Analysis
  49. 49. Canary Analysis Performance Integration Tests Chaos Conformity Static Unit Tests Make it easy to do the right thing Static & Functional Testing
  50. 50. 4. Reduce the cost of change
  51. 51. • Ongoing migrations • Library propagation • 100s of micro-services • Complex dependencies Continuous, Broad-based Change
  52. 52. Change Engineering • Locate • Communicate • Facilitate
  53. 53. • Automated forensics – Who last touched x? – What team? – Who was their manager? Who owns this artifact, repository, service?
  54. 54. Whitepages • Workday wrapper • App & REST API • Organization hierarchy • Metadata • Change log (###) ###-####
  55. 55. Krieger • REST-based service • Sources – Whitepages – Stash – Edda – Jenkins – Spinnaker – Etc… { "content": {}, "_links": { "employees": { "href": "/api/employees/" }, "projects": { "href": "/api/projects/" }, "teams": { "href": "/api/teams/" }, "applications": { "href": "/api/applications/" }, "jobs": { "href": "/api/build/jobs" }, "masters": { "href": "/api/build/masters" }, "projectDistribution": { "href": "/api/teams/projectDistribution" } } }
  56. 56. /api/employees?q=jevans "employees": [ { "id": "241", "firstName": "Josh", "lastName": "Evans", "username": "jevans", "email": "jevans@netflix.com", "jobTitle": "Director of Operations Engineering", "isManager": true, "isCurrent": true, "title": "Josh Evans (jevans) - Operations Engineering", "_links": { "self": { "href": "/api/employees/241" }, "manager": { "href": "/api/employees/117890" }, "team": { "href": "/api/teams/f9134a81" }, "projects": { "href": "/api/teams/f9134a81/projects" } } } ] }
  57. 57. • Security vulnerabilities – Who owns this service? • Platform updates – Who is using this version of this library? Today – Targeted Coordination
  58. 58. Automated, efficient technical project management • Communication • Guidance • Tracking Low tax for TPMs & engineers Security Fix Guava Future – Change Campaigns
  59. 59. 5. Develop Partnerships Beyond supply & demand
  60. 60. • Nearing completion • Aggressive schedule • Unexpected delays • Commitment to June delivery Spinnaker 1.0 – 1H 2015
  61. 61. • Built their own continuous delivery solution • Not positioned for engineering-wide support • Believes common solutions Edge Engineering
  62. 62. Partnership in Action • Strong relationship • Open discussions about concerns • Decision - leaned forward • +2 engineers on Spinnaker • Successful 1.0 launch
  63. 63. Moving Forward Together • Containers? • Achieving alignment • Collaborative exploration – Edge, Platform, Operations – A new paved road?
  64. 64. • Paved Road adopted – Adding new ones • Production Ready ongoing • Migrations easier • Reputation improving • Improved – Service uptime – Rate of change Payoffs
  65. 65. Putting it to the test in 2016 • Streaming production & test - EC2 Classic to VPC • Highly cross-functional • Complex dependencies • Zero downtime Stay tuned…
  66. 66. Five Strategies 1. Reach out 2. Make an impact 3. Make it easy to do the right thing 4. Reduce the cost of change 5. Develop partnerships
  67. 67. Open Sourced! https://netflix.github.io/
  68. 68. Josh Evans jevans@netflix.com @ops_engineering Questions?
  69. 69. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/ netflix-operations-devops
  • arload

    Nov. 9, 2018
  • leandrocarracedo

    Apr. 12, 2016
  • qjp

    Jan. 13, 2016
  • arjanell

    Jan. 11, 2016
  • joomanba

    Jan. 6, 2016
  • bbhenry

    Jan. 5, 2016
  • AkshayMathur12

    Jan. 5, 2016
  • zolker

    Jan. 5, 2016
  • manuchantra

    Jan. 5, 2016
  • esthercornerstone

    Jan. 5, 2016
  • choeungjin

    Jan. 5, 2016
  • rakeshnagar

    Jan. 5, 2016

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1mv6Kpr. Josh Evans uses the Netflix Operations Engineering as a case study to explore the challenges faced by centralized engineering teams and approaches to addressing those challenges. Filmed at qconsf.com. Josh Evans is Director of Operations Engineering at Netflix, with experience in e-commerce, playback control services, infrastructure, tools, testing, and operations.

Views

Total views

1,226

On Slideshare

0

From embeds

0

Number of embeds

5

Actions

Downloads

0

Shares

0

Comments

0

Likes

12

×