SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Cloud Capacity Planning
South Bay SRE meetup - August 9th, 2016
● Cloud Capacity Planning..an Oxymoron?
● Santa Cloud: How Netflix Does Holiday Capacity Planning
● The Data Behind the Planning
Presenting...
Cloud Capacity Planning..an Oxymoron?
South Bay SRE Meetup: August 9th, 2016
● > 83M households
● 190 Countries
● 35% of Internet traffic in US at peak
● Entirely on Cloud*, three regions
● Evacuate a region monthly...for 24 hours
● Capacity planning ~ 5 people! (in the room :-)
* Content served from homegrown OpenConnect CDN
Capacity Planning Concerns
● Facility considerations (Space, Power, Network, Cooling)
● Supply Chain Management Constraints and Relationships
● Hardware lifetime contour & failure rates (MTBF)
● Systems management staff
● Seasonal and unexpected burst considerations
● Workload colocation and performance demands
● Over-provisioning for reliability and rate of innovation
● Effective tooling
● Business continuity planning
(Cloud) Capacity Planning Concerns
● Facility considerations (Power, Network, Cooling)
● Supply Chain Management Constraints and Relationships
● Hardware lifetime contour & failure rates (MTBF)
● Systems management staff
● Seasonal and unexpected burst considerations
● Workload colocation and performance demands
● Over-provisioning for reliability and rate of innovation
● Effective tooling
● Business continuity planning
Cloud-specific CP Factors
● Capacity bounds..unknown (-)
● Vendor Decisions (-/+)
○ Hardware/Offering Evolution Timeline
○ Resource Demand (CPU/Mem/Disk/Net) Matrix
● On-Demand Capability (+)
Netflix Model
● Depend on the AWS on-demand pool for elasticity
● Monitor insufficient capacity exceptions (ICEs) for boundaries
● Invest heavily in 3 year reservations
● Maintain relatively few, large reserved pools
● Cloud Capacity Analytics team develops tools for insight
● Leverage cross-account resource borrowing
The Triad Cloud Impact
Innovation
Reliability
Efficiency
Default Preferred
Considerations of Scale
● Capacity required for critical footprint might require “guarantees”
● API-based observability has limits
● All resources have capacity limits/throttles
● Resource limits by default set for lowest common denominator
● Get creative with unused, but paid for capacity
● Billing file size!
Summary
Capacity
Planning
Coburn Watson
● Director of Performance and Reliability at Netflix
○ Site Reliability Engineering, Performance and OS Engineering, Traffic Management, Chaos Engineering,
Capacity Planning, Cloud Network Engineering
● @coburnw, cwatson@netflix.com
● Looking for some great capacity planning-minded folks
● Performance and Reliability Youtube Channel

Contenu connexe

En vedette

Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryMike McGarr
 
OTT & The Future of Connected TV
OTT & The Future of Connected TVOTT & The Future of Connected TV
OTT & The Future of Connected TVClearbridge Mobile
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondMike McGarr
 
Implementing DevOps
Implementing DevOpsImplementing DevOps
Implementing DevOpsMike McGarr
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and SecurityJason Chan
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux InstrumentationDarkStarSword
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s DilemmaAmazon Web Services
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 

En vedette (10)

Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous Delivery
 
OTT & The Future of Connected TV
OTT & The Future of Connected TVOTT & The Future of Connected TV
OTT & The Future of Connected TV
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyond
 
Implementing DevOps
Implementing DevOpsImplementing DevOps
Implementing DevOps
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and Security
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux Instrumentation
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Culture
CultureCulture
Culture
 

Plus de Coburn Watson

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Coburn Watson
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Coburn Watson
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in CheckCoburn Watson
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWSCoburn Watson
 

Plus de Coburn Watson (6)

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Check
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWS
 

Dernier

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Cloud Capacity Planning..an Oxymoron? - South Bay SRE Meetup Aug-09-2016

  • 1. Cloud Capacity Planning South Bay SRE meetup - August 9th, 2016
  • 2. ● Cloud Capacity Planning..an Oxymoron? ● Santa Cloud: How Netflix Does Holiday Capacity Planning ● The Data Behind the Planning Presenting...
  • 3. Cloud Capacity Planning..an Oxymoron? South Bay SRE Meetup: August 9th, 2016
  • 4. ● > 83M households ● 190 Countries ● 35% of Internet traffic in US at peak ● Entirely on Cloud*, three regions ● Evacuate a region monthly...for 24 hours ● Capacity planning ~ 5 people! (in the room :-) * Content served from homegrown OpenConnect CDN
  • 5. Capacity Planning Concerns ● Facility considerations (Space, Power, Network, Cooling) ● Supply Chain Management Constraints and Relationships ● Hardware lifetime contour & failure rates (MTBF) ● Systems management staff ● Seasonal and unexpected burst considerations ● Workload colocation and performance demands ● Over-provisioning for reliability and rate of innovation ● Effective tooling ● Business continuity planning
  • 6. (Cloud) Capacity Planning Concerns ● Facility considerations (Power, Network, Cooling) ● Supply Chain Management Constraints and Relationships ● Hardware lifetime contour & failure rates (MTBF) ● Systems management staff ● Seasonal and unexpected burst considerations ● Workload colocation and performance demands ● Over-provisioning for reliability and rate of innovation ● Effective tooling ● Business continuity planning
  • 7.
  • 8. Cloud-specific CP Factors ● Capacity bounds..unknown (-) ● Vendor Decisions (-/+) ○ Hardware/Offering Evolution Timeline ○ Resource Demand (CPU/Mem/Disk/Net) Matrix ● On-Demand Capability (+)
  • 9. Netflix Model ● Depend on the AWS on-demand pool for elasticity ● Monitor insufficient capacity exceptions (ICEs) for boundaries ● Invest heavily in 3 year reservations ● Maintain relatively few, large reserved pools ● Cloud Capacity Analytics team develops tools for insight ● Leverage cross-account resource borrowing
  • 10. The Triad Cloud Impact Innovation Reliability Efficiency Default Preferred
  • 11.
  • 12. Considerations of Scale ● Capacity required for critical footprint might require “guarantees” ● API-based observability has limits ● All resources have capacity limits/throttles ● Resource limits by default set for lowest common denominator ● Get creative with unused, but paid for capacity ● Billing file size!
  • 14. Coburn Watson ● Director of Performance and Reliability at Netflix ○ Site Reliability Engineering, Performance and OS Engineering, Traffic Management, Chaos Engineering, Capacity Planning, Cloud Network Engineering ● @coburnw, cwatson@netflix.com ● Looking for some great capacity planning-minded folks ● Performance and Reliability Youtube Channel