Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]

Damon Edwards (Rundeck/PagerDuty) presentation from DevOps Enterprise Summit Las Vegas Virtual on October 15, 2020.

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]

  1. 1. Damon Edwards October 13-15, 2020 Runbook Automation: Old News or a Key to Unlock Performance?
  2. 2. Damon Edwards
  3. 3. Damon Edwards
  4. 4. Damon Edwards
  5. 5. Lean/Flow Fast Feedback Loops CI/CD Small Batch Sizes Shift Left Cloud Platforms Microservices Continuous Testing Infrastructure as Code 3 Ways / 5 Ideals Automated Governance SLOs Kanban You build it, you run it
  6. 6. Lean/Flow Fast Feedback Loops CI/CD Small Batch Sizes Shift Left Cloud Platforms Microservices Continuous Testing Infrastructure as Code 3 Ways / 5 Ideals Automated Governance SLOs Kanban You build it, you run it What’s Next?
  7. 7. Thesis: The Next Great Unlocks Will Come from Ops…
  8. 8. Incident Management Thesis: The Next Great Unlocks Will Come from Ops…
  9. 9. Incident Management Service Requests Thesis: The Next Great Unlocks Will Come from Ops…
  10. 10. Incident Management
  11. 11. Incident Management What do we all want?
  12. 12. Incident Management What do we all want? Shorter Incidents. Fewer Escalations.
  13. 13. Incident Management What do we all want? Shorter Incidents. Fewer Escalations. What always gets in the way?
  14. 14. Incident Management What do we all want? Shorter Incidents. Fewer Escalations. What always gets in the way? Complexity.
  15. 15. Our world is complex, and not deterministic. J. Paul Reed Our world is complex, and not deterministic! John Allspaw
  16. 16. Development vs Operations
  17. 17. (deterministic POV) Development vs Operations
  18. 18. (deterministic POV) Code → Build → Run? ( 👍 / 👎 ) Development vs Operations
  19. 19. (deterministic POV) Code → Build → Run? ( 👍 / 👎 ) Development vs Operations
  20. 20. (deterministic POV) Code → Build → Run? ( 👍 / 👎 ) Development vs Operations SAIL/cornell.edu (stochastic POV)
  21. 21. (deterministic POV) Code → Build → Run? ( 👍 / 👎 ) Development vs Operations SAIL/cornell.edu (stochastic POV) Network traffic Usage patterns Config updates Platform changes API performance Library updates Hardware variation Tuning Data changes
  22. 22. (deterministic POV) Code → Build → Run? ( 👍 / 👎 ) Development vs Operations SAIL/cornell.edu (stochastic POV) Network traffic Usage patterns Config updates Platform changes API performance Library updates Hardware variation Tuning Tailoring Data changes
  23. 23. (deterministic POV) Code → Build → Run? ( 👍 / 👎 ) Development vs Operations SAIL/cornell.edu (stochastic POV) Network traffic Usage patterns Config updates Platform changes API performance Library updates Hardware variation Tuning Tailoring Richard Cook, M.D. Data changes
  24. 24. (deterministic POV) Code → Build → Run? ( 👍 / 👎 ) Development vs Operations SAIL/cornell.edu (stochastic POV) Network traffic Usage patterns Config updates Platform changes API performance Library updates Hardware variation Tuning “System as imagined” Tailoring Richard Cook, M.D. Data changes
  25. 25. (deterministic POV) Code → Build → Run? ( 👍 / 👎 ) Development vs Operations SAIL/cornell.edu (stochastic POV) Network traffic Usage patterns Config updates Platform changes API performance Library updates Hardware variation Tuning “System as imagined” “System as found” Tailoring Richard Cook, M.D. Data changes
  26. 26. What are people doing in “systems as found”: Richard Cook, M.D.
  27. 27. 1. Monitoring What are people doing in “systems as found”: Richard Cook, M.D.
  28. 28. 1. Monitoring 2. Responding (making sense of what they are seeing) What are people doing in “systems as found”: Richard Cook, M.D.
  29. 29. 1. Monitoring 2. Responding (making sense of what they are seeing) 3. Adapting (more tailoring of the complex system) What are people doing in “systems as found”: Richard Cook, M.D.
  30. 30. 1. Monitoring 2. Responding (making sense of what they are seeing) 3. Adapting (more tailoring of the complex system) 4. Learning (feedback loops and understanding) What are people doing in “systems as found”: Richard Cook, M.D.
  31. 31. 1. Monitoring 2. Responding (making sense of what they are seeing) 3. Adapting (more tailoring of the complex system) 4. Learning (feedback loops and understanding) What are people doing in “systems as found”: Automation alone can’t do this
  32. 32. Most important lesson from other high-consequence fields:
  33. 33. Most important lesson from other high-consequence fields: The role of automation is to support the human operator, not to replace the human operator.
  34. 34. “We must develop trust in our operators.”  Richard Cook, M.D.
  35. 35. “We must develop trust in our operators.”  “Too much design goes into preventing people from doing things” Richard Cook, M.D.
  36. 36. “We must develop trust in our operators.”  “Too much design goes into preventing people from doing things” “We need to reveal the actual controls that are available” Richard Cook, M.D.
  37. 37. Abstraction
  38. 38. Abstraction Too high: Black Box. (bad)
  39. 39. Abstraction Too high: Black Box. (bad) Dr. Lisanne Bainbridgee “The Ironies of Automation” 1982
  40. 40. Abstraction Too high: Black Box. (bad) Too low: ssh, sudo, and 🙏 (bad) Dr. Lisanne Bainbridgee “The Ironies of Automation” 1982
  41. 41. Instead your experts become the “abstraction”
  42. 42. Instead your experts become the “abstraction”
  43. 43. Tickets Tickets Tickets
  44. 44. Ticket Ticket Ticket
  45. 45. Self-Service
  46. 46. Self-Service TEAM A TEAM B TEAM C TEAM D
  47. 47. An Incident Management Example…
  48. 48. 3 Options:
  49. 49. 1. Decipher the wiki3 Options:
  50. 50. 1. Decipher the wiki 2. Ad-hoc tool/script usage3 Options:
  51. 51. 1. Decipher the wiki 2. Ad-hoc tool/script usage 3. ESCALATE!!3 Options:
  52. 52. Service Requests Too…
  53. 53. Enable New Org Models…
  54. 54. I build it. I run it. Platform Engineering
  55. 55. What’s the magnitude of impact?…
  56. 56. (#YMMV) What’s possible?
  57. 57. CFO ? (#YMMV) What’s possible?
  58. 58. CFO ? 60% Shorter Incidents (#YMMV) What’s possible?
  59. 59. CFO ? 60% Shorter Incidents 50% Fewer Escalations (#YMMV) What’s possible?
  60. 60. CFO ? 60% Shorter Incidents 50% Fewer Escalations 99%Faster Turnaround Time (#YMMV) What’s possible?
  61. 61. 60% Shorter Incidents 50% Fewer Escalations 99%Faster Turnaround Time CFO (#YMMV) What’s possible?
  62. 62. 60% Shorter Incidents 50% Fewer Escalations 99%Faster Turnaround Time CFO We’ve got Budget! (#YMMV) What’s possible?
  63. 63. TEAM A TEAM B TEAM C TEAM D Runbook Automation Creating Self-Service
  64. 64. TEAM A TEAM B TEAM C TEAM D Runbook Automation Creating Self-Service
  65. 65. TEAM A TEAM B TEAM C TEAM D Runbook Automation Creating Self-Service
  66. 66. TEAM A TEAM B TEAM C TEAM D Runbook Automation Do Creating Self-Service
  67. 67. TEAM A TEAM B TEAM C TEAM D Runbook Automation View Do Creating Self-Service
  68. 68. TEAM A TEAM B TEAM C TEAM D Runbook Automation AIOps Observability View Do Creating Self-Service
  69. 69. Self-Service Operations The next great unlocks? Runbook Automation AIOps / Observability
  70. 70. Let’s Talk! @damonedwards damon@pagerduty.com Damon Edwards
  71. 71. "How Complex Systems Fail” - 1998 Dr Richard Cook, MD https://how.complexsystems.fail/ “Ironies of Automation” - 1982 Dr. Lisanne Bainbridge https://www.ise.ncsu.edu/wp-content/uploads/ 2017/02/Bainbridge_1983_Automatica.pdf https://youtu.be/URDBE4q_IgM The Troubles of Automating “All the Things” - 2019 J. Paul Reed How Your Systems Keep Running - 2017 John Allspaw https://youtu.be/xA5U85LSk0M Slides: rundeck.com/does20
  72. 72. https://youtu.be/1zUtBLZ4Lus Operations: The Last Mile Damon Edwards Slides: rundeck.com/does20 Operations: The Last Mile Incident Management Meets DevOps Bhavik Gudka & Surya Avirneni https://youtu.be/6W-HuG5a8L0

×