SlideShare une entreprise Scribd logo
1  sur  47
Netflix: Embracing the Cloud
Neil Hunt, CPO / Yury Izrailevsky, VP Engineering
Netflix – Service Unavailable – Database Crashed

Rest assured that the right people
are losing sleep to fix this problem!

We expect to resume service in approximately 72h


12 Aug 2008 03:12am
Availability
                4 x nines




    Scale             Performance
 Unconstrained              Unlimited
horizontal scaling          compute
• Experimented with both
• Ended up with NoSQL for almost everything important
Transitional Infrastructure: “Roman Riding”
Phase          Components         Data & Prerequisites
Trial (2009)   Streaming Player   Content keys (RO)
                                  Membership status (RO)
Development Member product        Content catalog (RW)
(2010-11)   pages and APIs        Personalization data
                                  (RW) & recs algorithms
                                  AB Test data (RW)
Followthrough Account and         Membership data (RW)
(2011-12)     membership
Final (2013) Payments             PCI and SOX data
Availability
                4 x nines




    Scale             Performance
 Unconstrained              Unlimited
horizontal scaling          compute
Scalability   Performance   Availability
Scalability   Performance   Availability
1/4/2009
      2/4/2009
      3/4/2009
      4/4/2009
      5/4/2009
      6/4/2009
      7/4/2009
      8/4/2009
      9/4/2009
     10/4/2009
     11/4/2009
     12/4/2009
      1/4/2010
      2/4/2010
      3/4/2010
      4/4/2010
      5/4/2010
      6/4/2010
      7/4/2010
      8/4/2010
      9/4/2010
     10/4/2010
     11/4/2010
     12/4/2010
      1/4/2011
      2/4/2011
      3/4/2011
      4/4/2011
      5/4/2011
      6/4/2011
      7/4/2011
      8/4/2011
      9/4/2011
     10/4/2011
     11/4/2011
     12/4/2011
      1/4/2012
      2/4/2012
      3/4/2012
      4/4/2012
      5/4/2012
      6/4/2012
      7/4/2012
      8/4/2012
                 Scaling Netflix Streaming Service: Weekly Streaming Starts




23
Netflix Cross-Regional Cloud Architecture
Goal: Regional Failover
Building Global Netflix Streaming Product
Scalability   Performance   Availability
Weekly Cloud Cost Per Streaming Start (last 12 months)




                                                         28
Simian Army: Cloud Efficiency Automation
   Janitor Monkey
     Regularly scrape unused capacity
     Clean up instances, ASGs, ELBs, SGs, etc.
   Efficiency Monkey
     AI-based resource under-usage detection
      (CPU, memory, etc.)
   Automated Deletion of Old Data
     TTL for S3 (using ObjectExpiration)




                                                  29
Cyclical Streaming Usage Pattern




                                   30
Load-Based Auto Scaling




                             50%+ Cost Saving
                                          Scale up/down
                                             by 70%+




         Move to Load-Based Scaling



                                                          31
                                                          31
Scalability   Performance   Availability
A Truly Great Service…      Has To Just Work!




            Availability Goal: 99.99%
          (30 secs/week at peak traffic)
                                                33
7/17/2011
 7/24/2011
 7/31/2011
  8/7/2011
 8/14/2011
 8/21/2011
 8/28/2011
  9/4/2011
 9/11/2011
 9/18/2011
 9/25/2011
 10/2/2011
 10/9/2011
10/16/2011
10/23/2011
10/30/2011
 11/6/2011
11/13/2011
11/20/2011
11/27/2011
 12/4/2011
12/11/2011
12/18/2011
12/25/2011
  1/1/2012
  1/8/2012
 1/15/2012
 1/22/2012
 1/29/2012
  2/5/2012
 2/12/2012
 2/19/2012
 2/26/2012
  3/4/2012
 3/11/2012
 3/18/2012
 3/25/2012
  4/1/2012
  4/8/2012
 4/15/2012
 4/22/2012
                                                                                            Other AWS Outages




 4/29/2012
  5/6/2012
 5/13/2012
 5/20/2012
 5/27/2012
  6/3/2012
 6/10/2012
 6/17/2012
 6/24/2012
  7/1/2012
                                                                                                                Historical Streaming Availability (13wkMA)




  7/8/2012
                                                                          Outage




 7/15/2012
 7/22/2012
 7/29/2012
  8/5/2012
 8/12/2012
                                                                          AWS / Netflix




 8/19/2012
 8/26/2012
                                                                          June 29th, 2012




  9/2/2012
  9/9/2012
 9/16/2012
 9/23/2012
 9/30/2012
 10/7/2012
    14-Oct
10/21/2012
10/28/2012
             Using Redundancy in AWS Infrastructure to Survive Failures




 11/4/2012
11/11/2012
Cascading Failures




               API




              Instant
              Queue




              SimpleDB

                         35
Netflix Cloud Architecture




                             36
Cascading Failures




                   X                      …
99% Availability       99% Availability       99% Availability


                       300
            99%              = 4.90%                             37
Strategies to Improve Availability




        Graceful
       Degradation                   Redundancy




                                                  38
Graceful Degradation




                       39
Redundancy



                           A        B       C
    Zone   Zone   Zone          Cassandra
     A      B      C



                                S3 Backup

   Redundancy
 Across Availability           Secure Cloud
      Zones                      Backup

                         Storage Redundancy
                               Across
                                                40
                          Regions, Vendors
Testing Fault Tolerance: Simian Army




   Chaos Monkey       Latency Monkey   Chaos Gorilla




                                                       4
Open Source Portal at http://netflix.github.com
Superstorm Sandy

                   AWS Infrastructure Held Up


                   >2x Netflix Streaming Usage
                   in East Coast Markets
                      Boston
                      New York
                      Philadelphia
                      Baltimore
                      D.C.
Focus on Building a Great Streaming Product




                                              44
Netflix at 2012 re:Invent

Date/Time         Presenter             Topic
Wed 8:30-10:00    Reed Hastings         Keynote with Andy Jassy
Wed 1:00-1:45     Coburn Watson         Optimizing Costs with AWS
Wed 2:05-2:55     Kevin McEntee         Netflix’s Transcoding Transformation
Wed 3:25-4:15     Neil Hunt / Yury I.   Netflix: Embracing the Cloud
Wed 4:30-5:20     Adrian Cockcroft      High Availability Architecture at Netflix
Thu 10:30-11:20   Jeremy Edberg         Rainmakers – Operating Clouds
Thu 11:35-12:25   Kurt Brown            Data Science with Elastic Map Reduce (EMR)
Thu 11:35-12:25   Jason Chan            Security Panel: Learn from CISOs working with AWS
Thu 3:00-3:50     Adrian Cockcroft      Compute & Networking Masters Customer Panel
Thu 3:00-3:50     Ruslan M./Gregg U.    Optimizing Your Cassandra Database on AWS
Thu 4:05-4:55     Ariel Tseitlin        Intro to Chaos Monkey and the Simian Army
We are sincerely eager to
 hear your feedback on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.
We are sincerely eager to
 hear your feedback on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.

Contenu connexe

Tendances

NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Ruslan Meshenberg
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1aspyker
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayentaaspyker
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2aspyker
 
Stranger Things: The Forces that Disrupt Netflix
Stranger Things: The Forces that Disrupt NetflixStranger Things: The Forces that Disrupt Netflix
Stranger Things: The Forces that Disrupt NetflixC4Media
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thingaspyker
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixCodemotion Tel Aviv
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container ChallengesRakuten Group, Inc.
 
Netflix viewing data architecture evolution - EBJUG Nov 2014
Netflix viewing data architecture evolution - EBJUG Nov 2014Netflix viewing data architecture evolution - EBJUG Nov 2014
Netflix viewing data architecture evolution - EBJUG Nov 2014Philip Fisher-Ogden
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1aspyker
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedis Labs
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientistsaspyker
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
 

Tendances (20)

NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
 
Stranger Things: The Forces that Disrupt Netflix
Stranger Things: The Forces that Disrupt NetflixStranger Things: The Forces that Disrupt Netflix
Stranger Things: The Forces that Disrupt Netflix
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container Challenges
 
Netflix viewing data architecture evolution - EBJUG Nov 2014
Netflix viewing data architecture evolution - EBJUG Nov 2014Netflix viewing data architecture evolution - EBJUG Nov 2014
Netflix viewing data architecture evolution - EBJUG Nov 2014
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
 

En vedette

NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2Ruslan Meshenberg
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Coburn Watson
 
Engineering Velocity: Shifting the Curve at Netflix
Engineering Velocity: Shifting the Curve at NetflixEngineering Velocity: Shifting the Curve at Netflix
Engineering Velocity: Shifting the Curve at NetflixDianne Marsh
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixDianne Marsh
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
 
Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryMike McGarr
 
OTT & The Future of Connected TV
OTT & The Future of Connected TVOTT & The Future of Connected TV
OTT & The Future of Connected TVClearbridge Mobile
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondMike McGarr
 
Implementing DevOps
Implementing DevOpsImplementing DevOps
Implementing DevOpsMike McGarr
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and SecurityJason Chan
 
How Netflix thinks of DevOps. Spoiler: we don’t.
How Netflix thinks of DevOps. Spoiler: we don’t.How Netflix thinks of DevOps. Spoiler: we don’t.
How Netflix thinks of DevOps. Spoiler: we don’t.Dianne Marsh
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s DilemmaAmazon Web Services
 

En vedette (13)

NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
 
Engineering Velocity: Shifting the Curve at Netflix
Engineering Velocity: Shifting the Curve at NetflixEngineering Velocity: Shifting the Curve at Netflix
Engineering Velocity: Shifting the Curve at Netflix
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at Netflix
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous Delivery
 
OTT & The Future of Connected TV
OTT & The Future of Connected TVOTT & The Future of Connected TV
OTT & The Future of Connected TV
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyond
 
Implementing DevOps
Implementing DevOpsImplementing DevOps
Implementing DevOps
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and Security
 
How Netflix thinks of DevOps. Spoiler: we don’t.
How Netflix thinks of DevOps. Spoiler: we don’t.How Netflix thinks of DevOps. Spoiler: we don’t.
How Netflix thinks of DevOps. Spoiler: we don’t.
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
 

Similaire à Netflix: Embracing the Cloud

8 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp018 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp01Carl Chesal
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qconYiwei Ma
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...Amazon Web Services
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
AWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAmazon Web Services
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Testbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudTestbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudCloudLightning
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebjineshvaria
 
Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformSudhir Tonse
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to CloudStuart Lodge
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldRob Gillen
 

Similaire à Netflix: Embracing the Cloud (20)

8 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp018 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp01
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qcon
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
 
Fermilab aws on demand
Fermilab aws on demandFermilab aws on demand
Fermilab aws on demand
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
AWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote Slides
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Testbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudTestbed for Heterogeneous Cloud
Testbed for Heterogeneous Cloud
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWeb
 
Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud Platform
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to Cloud
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 

Netflix: Embracing the Cloud

  • 1. Netflix: Embracing the Cloud Neil Hunt, CPO / Yury Izrailevsky, VP Engineering
  • 2.
  • 3. Netflix – Service Unavailable – Database Crashed Rest assured that the right people are losing sleep to fix this problem! We expect to resume service in approximately 72h 12 Aug 2008 03:12am
  • 4.
  • 5. Availability 4 x nines Scale Performance Unconstrained Unlimited horizontal scaling compute
  • 6.
  • 7.
  • 8.
  • 9. • Experimented with both • Ended up with NoSQL for almost everything important
  • 10.
  • 11.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Phase Components Data & Prerequisites Trial (2009) Streaming Player Content keys (RO) Membership status (RO) Development Member product Content catalog (RW) (2010-11) pages and APIs Personalization data (RW) & recs algorithms AB Test data (RW) Followthrough Account and Membership data (RW) (2011-12) membership Final (2013) Payments PCI and SOX data
  • 18.
  • 19.
  • 20. Availability 4 x nines Scale Performance Unconstrained Unlimited horizontal scaling compute
  • 21. Scalability Performance Availability
  • 22. Scalability Performance Availability
  • 23. 1/4/2009 2/4/2009 3/4/2009 4/4/2009 5/4/2009 6/4/2009 7/4/2009 8/4/2009 9/4/2009 10/4/2009 11/4/2009 12/4/2009 1/4/2010 2/4/2010 3/4/2010 4/4/2010 5/4/2010 6/4/2010 7/4/2010 8/4/2010 9/4/2010 10/4/2010 11/4/2010 12/4/2010 1/4/2011 2/4/2011 3/4/2011 4/4/2011 5/4/2011 6/4/2011 7/4/2011 8/4/2011 9/4/2011 10/4/2011 11/4/2011 12/4/2011 1/4/2012 2/4/2012 3/4/2012 4/4/2012 5/4/2012 6/4/2012 7/4/2012 8/4/2012 Scaling Netflix Streaming Service: Weekly Streaming Starts 23
  • 26. Building Global Netflix Streaming Product
  • 27. Scalability Performance Availability
  • 28. Weekly Cloud Cost Per Streaming Start (last 12 months) 28
  • 29. Simian Army: Cloud Efficiency Automation  Janitor Monkey  Regularly scrape unused capacity  Clean up instances, ASGs, ELBs, SGs, etc.  Efficiency Monkey  AI-based resource under-usage detection (CPU, memory, etc.)  Automated Deletion of Old Data  TTL for S3 (using ObjectExpiration) 29
  • 31. Load-Based Auto Scaling 50%+ Cost Saving Scale up/down by 70%+ Move to Load-Based Scaling 31 31
  • 32. Scalability Performance Availability
  • 33. A Truly Great Service… Has To Just Work! Availability Goal: 99.99% (30 secs/week at peak traffic) 33
  • 34. 7/17/2011 7/24/2011 7/31/2011 8/7/2011 8/14/2011 8/21/2011 8/28/2011 9/4/2011 9/11/2011 9/18/2011 9/25/2011 10/2/2011 10/9/2011 10/16/2011 10/23/2011 10/30/2011 11/6/2011 11/13/2011 11/20/2011 11/27/2011 12/4/2011 12/11/2011 12/18/2011 12/25/2011 1/1/2012 1/8/2012 1/15/2012 1/22/2012 1/29/2012 2/5/2012 2/12/2012 2/19/2012 2/26/2012 3/4/2012 3/11/2012 3/18/2012 3/25/2012 4/1/2012 4/8/2012 4/15/2012 4/22/2012 Other AWS Outages 4/29/2012 5/6/2012 5/13/2012 5/20/2012 5/27/2012 6/3/2012 6/10/2012 6/17/2012 6/24/2012 7/1/2012 Historical Streaming Availability (13wkMA) 7/8/2012 Outage 7/15/2012 7/22/2012 7/29/2012 8/5/2012 8/12/2012 AWS / Netflix 8/19/2012 8/26/2012 June 29th, 2012 9/2/2012 9/9/2012 9/16/2012 9/23/2012 9/30/2012 10/7/2012 14-Oct 10/21/2012 10/28/2012 Using Redundancy in AWS Infrastructure to Survive Failures 11/4/2012 11/11/2012
  • 35. Cascading Failures API Instant Queue SimpleDB 35
  • 37. Cascading Failures X … 99% Availability 99% Availability 99% Availability 300 99% = 4.90% 37
  • 38. Strategies to Improve Availability Graceful Degradation Redundancy 38
  • 40. Redundancy A B C Zone Zone Zone Cassandra A B C S3 Backup Redundancy Across Availability Secure Cloud Zones Backup Storage Redundancy Across 40 Regions, Vendors
  • 41. Testing Fault Tolerance: Simian Army Chaos Monkey Latency Monkey Chaos Gorilla 4
  • 42. Open Source Portal at http://netflix.github.com
  • 43. Superstorm Sandy AWS Infrastructure Held Up >2x Netflix Streaming Usage in East Coast Markets  Boston  New York  Philadelphia  Baltimore  D.C.
  • 44. Focus on Building a Great Streaming Product 44
  • 45. Netflix at 2012 re:Invent Date/Time Presenter Topic Wed 8:30-10:00 Reed Hastings Keynote with Andy Jassy Wed 1:00-1:45 Coburn Watson Optimizing Costs with AWS Wed 2:05-2:55 Kevin McEntee Netflix’s Transcoding Transformation Wed 3:25-4:15 Neil Hunt / Yury I. Netflix: Embracing the Cloud Wed 4:30-5:20 Adrian Cockcroft High Availability Architecture at Netflix Thu 10:30-11:20 Jeremy Edberg Rainmakers – Operating Clouds Thu 11:35-12:25 Kurt Brown Data Science with Elastic Map Reduce (EMR) Thu 11:35-12:25 Jason Chan Security Panel: Learn from CISOs working with AWS Thu 3:00-3:50 Adrian Cockcroft Compute & Networking Masters Customer Panel Thu 3:00-3:50 Ruslan M./Gregg U. Optimizing Your Cassandra Database on AWS Thu 4:05-4:55 Ariel Tseitlin Intro to Chaos Monkey and the Simian Army
  • 46. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.
  • 47. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.

Notes de l'éditeur

  1. Make clear it’s still tentative, not a committed project – longer term…