SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Another Week, Another Million
Containers on Amazon EC2
Andrew Spyker
Software Engineering Manager
Netflix
C M P 3 7 6
Joe Hsieh
Principal Technical Account Manager
Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why containers?
Given our VM architecture comprised of …
Amazingly resilient
Microservice driven
Cloud native
CI/CD DevOps enabled
Elastically scalable
Do we really need containers?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What was missing from our VM environment?
Packaging
• Simple to customize application focused artifacts
• Especially for growth of polyglot environments
• Notably for platforms with OS level dependencies
Local development
• Ability to run applications locally on developer laptops
Simple way to manage compute resources
• Especially for ad hoc batch processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus, Netflix’s container management platform
Scheduling
• Service & batch job lifecycle
• Resource management
Container execution
• AWS Integration
• Netflix Ecosystem Support
Job and Fleet Management
Batch
Resource Management & Optimization
Container Execution
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Titus team
• Design
• Develop
• Operate
• Support
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus and containers product strategy
• Ordered priority focus on
• Developer velocity
• Reliability
• Cost efficiency
Easy migration from VMs to containers
Easy container integration with VMs and Amazon Services
Focus on just what Netflix needs
“Our focus is to leverage EC2 deeply in Titus,
not abstract it away or implement similar
features. We see this as a differentiator of
Titus versus other container management
solutions.”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Mesos
High level architecture
Titus Control Plane
• API
• Scheduling
• Job Lifecycle Control
Fenzo
Titus Agents
User Containers
Docker
Mesos Agent
Netflix System Services
AWS Virtual Machines
Docker Registry
Cassandra
AWS Auto Scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EC2 virtual machine portability
Early on we decided a container MUST …
• Natively integrate with VPC for networking
• Natively integrate with security groups for firewalling
• Work with IAM based Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key leverage points
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EC2
GPUs - 10’s of p2.8xlarges
Memory optimized - 100’s of r4.16xlarges
General purpose - 1000’s of m4.16xlarges
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VPC and security groups
EC2 VM
ENI0
(to control plane)
ENI1
SG = w
ENI2
SG = x
ENIn
SG = z
Container 1
SG = w
ENI1 IP1
Container 2
SG = w
ENI1 IP2
Container 3
SG = y
ENI3 IP1
Titus
Container
Mgmt
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
IAM based services
EC2 VM
ENI0
Container 1
eth0 ethMD
ENI1
Titus
Metadata
Proxy
Normal
networking 169.254.169.254
Amazon Metadata Service and
Security Token Service (STS)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus Host
Instance cryptographic identity
Metatron
Service
User
Container
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
All I really needed to know about
containers, I learned from Titus …
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choices for Auto Scaling Titus applications
Use the two existing Netflix autoscaling engines we already had
• Pro: Code existed
• Con: Lacking features, we’d have to operate
Write a new one
Look for one from Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choices for Auto Scaling Titus applications
Use the two existing Netflix autoscaling engines we already had
Write a new one
• Pro: Would be specific to our needs
• Con: Would be lacking features, we’d have to operate
Look for one from Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choices for Auto Scaling Titus applications
Use the two existing Netflix autoscaling engines we already had
Write a new one
Look for one from Amazon Web Services
• Pro: Already well understood for VMs, feature-rich
• Con: Only works for VMs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A true story
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A product manager introduction,
development team interchanges, and
multiple iterations later …
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Application Auto Scaling with custom resources
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Configuring Auto Scaling in Spinnaker
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus and Application Auto Scaling integration
User Containers
Control Plane
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus API call pattern
CreateNetworkInterface Total CreateNetworkInterface Throttled
AttachNetworkInterface Total AttachNetworkInterface Throttled
ModifyNetworkInterfaceAttribute Total ModifyNetworkInterfaceAttribute Throttled
AssignPrivateIpAddresses Total AssignPrivateIpAddresses Throttled
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto Scaling group Auto Scaling group Auto Scaling group
An infrastructure view of applications
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
An infrastructure view of applications
Auto Scaling group
VPC
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
API calls
RunInstances
CreateNetworkInterface
AttachNetworkInterface
AssignPrivateIpAddress
ModifyNetworkInterface
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Netflix regional failover
Kong evacuation of us-east-1
Traffic diverted to other regions
Fail back to us-east-1
Traffic moved back to us-east-1
us-east-1
eu-west-1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Infrastructure challenge
• Increase capacity during scale up of savior region
• Launch 1000s of containers in seven minutes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Easy right?
“we reduced time to schedule 30,000
pods onto 1,000 nodes from
8,780 seconds to 587 seconds”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Easy right?
“we reduced time to schedule 30,000
pods onto 1,000 nodes from
8,780 seconds to 587 seconds”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus can do this by …
• Dynamically changeable scheduling behavior
• Fleet wide networking optimizations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Normal scheduling
VM1
App 1
App 2
ENI 1 App 2
IP1 IP1
VM2
App 1
ENI 1
IP1
VMn
App 1
App 2
ENI 1 App 2
IP1 IP1
Trade-off for reliability
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Failover scheduling
VM1
App 1
App 2
ENI 1 App 2
IP1 IP1
VM2
App 1
ENI 1
IP1
VMn
App 1
App 2
ENI 1 App 2
IP1 IP1
App 1
App 1
App 1
App 1
App 1
App 2
App 2
IP2, IP3 IP2, IP3, IP4 IP2, IP3
Trade-off for speed
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On each host
Change when create and attach ENIs is performed
• Moved this to instance start time
• No longer needed on-demand
Need to burst allocate IP addresses
• Opportunistically batch allocate at container launch time
• Likely if one container was launched more are coming
• Garbage collect unused later
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus API pattern
ModifyNetworkInterfaceAttribute Total ModifyNetworkInterfaceAttribute Throttled
AssignPrivateIpAddresses Total AssignPrivateIpAddresses Throttled
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results
us-east-1 / prod
containers started per minute
} 7500 Launched
in 5 minutes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Netflix load balancing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
IP based Application Load Balancing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Configuring EC2 load balancers in Spinnaker
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus and Load Balancing integration
User Containers
Control Plane
IP Target
Group
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use cases on Titus
• Netflix API, Node.js Backend UI Scripts
• Machine Learning (GPUs) for personalization
• Encoding and Content use cases
• Netflix Studio use cases
• CDN tracking and planning
• Massively parallel CI system
• Data Pipeline routing and SPaaS
• Big Data platform use cases
Batch
Q4 15
Basic
Services
1Q 16
Production
Services
4Q 16
Customer
Facing
Services
2Q 17
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Q4 2018 container usage
Common
Jobs launched 255K jobs / day
Different applications 1K+ different images
Isolated Titus deployments 7 stacks
Services
Single app cluster size 5K (real), 12K containers (benchmark)
Hosts managed 7K VMs (435,000 CPUs)
Batch
Containers launched 450K / day (750K / day peak)
Hosts managed (autoscaled) 55K VMs / month
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Open Source
Open sourced April 2018
Help other communities by sharing our approach
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Current and future work
Advanced CPU Isolation Opportunistic Workloads
Nitro and Bare Metal Instances Next Amazon and Netflix
Partnership
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Andrew Spyker
@aspyker
Joe Hsieh
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Contenu connexe

Tendances

Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflixaspyker
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
NetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talkaspyker
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2aspyker
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-finalRuslan Meshenberg
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015aspyker
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientistsaspyker
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using DatadogMukta Aphale
 
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Tim Bozarth
 
CDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaCCDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaCsmalltown
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talkaspyker
 
The service mesh management plane
The service mesh management planeThe service mesh management plane
The service mesh management planeLibbySchulze
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixCodemotion Tel Aviv
 
DevOps at Tradeshift - AWS community day nordics
DevOps at Tradeshift - AWS community day nordicsDevOps at Tradeshift - AWS community day nordics
DevOps at Tradeshift - AWS community day nordicsJesperTerkelsen1
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1aspyker
 

Tendances (20)

Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflix
 
The new Netflix API
The new Netflix APIThe new Netflix API
The new Netflix API
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
NetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talk
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-final
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
 
CDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaCCDK Meetup: Rule the World through IaC
CDK Meetup: Rule the World through IaC
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talk
 
The service mesh management plane
The service mesh management planeThe service mesh management plane
The service mesh management plane
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
DevOps at Tradeshift - AWS community day nordics
DevOps at Tradeshift - AWS community day nordicsDevOps at Tradeshift - AWS community day nordics
DevOps at Tradeshift - AWS community day nordics
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
 

Similaire à CMP376 - Another Week, Another Million Containers on Amazon EC2

Getting Started with Kubernetes on AWS
Getting Started with Kubernetes on AWSGetting Started with Kubernetes on AWS
Getting Started with Kubernetes on AWSAmazon Web Services
 
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Amazon Web Services
 
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSExpert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSAmazon Web Services
 
[AWS Container Service] Getting Started with Kubernetes on AWS
[AWS Container Service] Getting Started with Kubernetes on AWS[AWS Container Service] Getting Started with Kubernetes on AWS
[AWS Container Service] Getting Started with Kubernetes on AWSAmazon Web Services Korea
 
Expert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSExpert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSAmazon Web Services
 
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018AWS Germany
 
SRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSSRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSAmazon Web Services
 
More Containers Less Operations
More Containers Less OperationsMore Containers Less Operations
More Containers Less OperationsDonnie Prakoso
 
Exciting world of Amazon container services with AWS Fargate and Amazon EKS
Exciting world of Amazon container services with AWS Fargate and Amazon EKSExciting world of Amazon container services with AWS Fargate and Amazon EKS
Exciting world of Amazon container services with AWS Fargate and Amazon EKSAmazon Web Services
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
 
Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28Boaz Ziniman
 
Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28Amazon Web Services
 
Getting Started with Containers on AWS
Getting Started with Containers on AWSGetting Started with Containers on AWS
Getting Started with Containers on AWSAmazon Web Services
 
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Amazon Web Services
 
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Amazon Web Services
 
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Amazon Web Services
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Amazon Web Services
 
Wildrydes Serverless Workshop Tel Aviv
Wildrydes Serverless Workshop Tel AvivWildrydes Serverless Workshop Tel Aviv
Wildrydes Serverless Workshop Tel AvivBoaz Ziniman
 

Similaire à CMP376 - Another Week, Another Million Containers on Amazon EC2 (20)

Getting Started with Kubernetes on AWS
Getting Started with Kubernetes on AWSGetting Started with Kubernetes on AWS
Getting Started with Kubernetes on AWS
 
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
 
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSExpert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWS
 
Deep Dive into Amazon Fargate
Deep Dive into Amazon FargateDeep Dive into Amazon Fargate
Deep Dive into Amazon Fargate
 
[AWS Container Service] Getting Started with Kubernetes on AWS
[AWS Container Service] Getting Started with Kubernetes on AWS[AWS Container Service] Getting Started with Kubernetes on AWS
[AWS Container Service] Getting Started with Kubernetes on AWS
 
Expert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSExpert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWS
 
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
 
SRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSSRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKS
 
Introducing AWS Fargate
Introducing AWS FargateIntroducing AWS Fargate
Introducing AWS Fargate
 
More Containers Less Operations
More Containers Less OperationsMore Containers Less Operations
More Containers Less Operations
 
Exciting world of Amazon container services with AWS Fargate and Amazon EKS
Exciting world of Amazon container services with AWS Fargate and Amazon EKSExciting world of Amazon container services with AWS Fargate and Amazon EKS
Exciting world of Amazon container services with AWS Fargate and Amazon EKS
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
 
Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28
 
Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28
 
Getting Started with Containers on AWS
Getting Started with Containers on AWSGetting Started with Containers on AWS
Getting Started with Containers on AWS
 
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
 
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
 
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
 
Wildrydes Serverless Workshop Tel Aviv
Wildrydes Serverless Workshop Tel AvivWildrydes Serverless Workshop Tel Aviv
Wildrydes Serverless Workshop Tel Aviv
 

Plus de aspyker

SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talkaspyker
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17aspyker
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4aspyker
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3aspyker
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
 
Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014aspyker
 
Netflix s2e1lightningtalk
Netflix s2e1lightningtalkNetflix s2e1lightningtalk
Netflix s2e1lightningtalkaspyker
 
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@PulseGoing Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulseaspyker
 

Plus de aspyker (13)

SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talk
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014
 
Netflix s2e1lightningtalk
Netflix s2e1lightningtalkNetflix s2e1lightningtalk
Netflix s2e1lightningtalk
 
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@PulseGoing Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
 

Dernier

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 

Dernier (20)

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 

CMP376 - Another Week, Another Million Containers on Amazon EC2

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Another Week, Another Million Containers on Amazon EC2 Andrew Spyker Software Engineering Manager Netflix C M P 3 7 6 Joe Hsieh Principal Technical Account Manager Amazon Web Services
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why containers? Given our VM architecture comprised of … Amazingly resilient Microservice driven Cloud native CI/CD DevOps enabled Elastically scalable Do we really need containers?
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What was missing from our VM environment? Packaging • Simple to customize application focused artifacts • Especially for growth of polyglot environments • Notably for platforms with OS level dependencies Local development • Ability to run applications locally on developer laptops Simple way to manage compute resources • Especially for ad hoc batch processing
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus, Netflix’s container management platform Scheduling • Service & batch job lifecycle • Resource management Container execution • AWS Integration • Netflix Ecosystem Support Job and Fleet Management Batch Resource Management & Optimization Container Execution Service
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Titus team • Design • Develop • Operate • Support
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus and containers product strategy • Ordered priority focus on • Developer velocity • Reliability • Cost efficiency Easy migration from VMs to containers Easy container integration with VMs and Amazon Services Focus on just what Netflix needs
  • 8. “Our focus is to leverage EC2 deeply in Titus, not abstract it away or implement similar features. We see this as a differentiator of Titus versus other container management solutions.”
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Mesos High level architecture Titus Control Plane • API • Scheduling • Job Lifecycle Control Fenzo Titus Agents User Containers Docker Mesos Agent Netflix System Services AWS Virtual Machines Docker Registry Cassandra AWS Auto Scaling
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. EC2 virtual machine portability Early on we decided a container MUST … • Natively integrate with VPC for networking • Natively integrate with security groups for firewalling • Work with IAM based Amazon Web Services
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Key leverage points
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 GPUs - 10’s of p2.8xlarges Memory optimized - 100’s of r4.16xlarges General purpose - 1000’s of m4.16xlarges
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VPC and security groups EC2 VM ENI0 (to control plane) ENI1 SG = w ENI2 SG = x ENIn SG = z Container 1 SG = w ENI1 IP1 Container 2 SG = w ENI1 IP2 Container 3 SG = y ENI3 IP1 Titus Container Mgmt
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. IAM based services EC2 VM ENI0 Container 1 eth0 ethMD ENI1 Titus Metadata Proxy Normal networking 169.254.169.254 Amazon Metadata Service and Security Token Service (STS)
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus Host Instance cryptographic identity Metatron Service User Container
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. All I really needed to know about containers, I learned from Titus …
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Choices for Auto Scaling Titus applications Use the two existing Netflix autoscaling engines we already had • Pro: Code existed • Con: Lacking features, we’d have to operate Write a new one Look for one from Amazon Web Services
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Choices for Auto Scaling Titus applications Use the two existing Netflix autoscaling engines we already had Write a new one • Pro: Would be specific to our needs • Con: Would be lacking features, we’d have to operate Look for one from Amazon Web Services
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Choices for Auto Scaling Titus applications Use the two existing Netflix autoscaling engines we already had Write a new one Look for one from Amazon Web Services • Pro: Already well understood for VMs, feature-rich • Con: Only works for VMs
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. A true story
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. A product manager introduction, development team interchanges, and multiple iterations later …
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Application Auto Scaling with custom resources
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Configuring Auto Scaling in Spinnaker
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus and Application Auto Scaling integration User Containers Control Plane
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus API call pattern CreateNetworkInterface Total CreateNetworkInterface Throttled AttachNetworkInterface Total AttachNetworkInterface Throttled ModifyNetworkInterfaceAttribute Total ModifyNetworkInterfaceAttribute Throttled AssignPrivateIpAddresses Total AssignPrivateIpAddresses Throttled
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto Scaling group Auto Scaling group Auto Scaling group An infrastructure view of applications
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. An infrastructure view of applications Auto Scaling group VPC
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. API calls RunInstances CreateNetworkInterface AttachNetworkInterface AssignPrivateIpAddress ModifyNetworkInterface
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Netflix regional failover Kong evacuation of us-east-1 Traffic diverted to other regions Fail back to us-east-1 Traffic moved back to us-east-1 us-east-1 eu-west-1
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Infrastructure challenge • Increase capacity during scale up of savior region • Launch 1000s of containers in seven minutes
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Easy right? “we reduced time to schedule 30,000 pods onto 1,000 nodes from 8,780 seconds to 587 seconds”
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Easy right? “we reduced time to schedule 30,000 pods onto 1,000 nodes from 8,780 seconds to 587 seconds”
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus can do this by … • Dynamically changeable scheduling behavior • Fleet wide networking optimizations
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Normal scheduling VM1 App 1 App 2 ENI 1 App 2 IP1 IP1 VM2 App 1 ENI 1 IP1 VMn App 1 App 2 ENI 1 App 2 IP1 IP1 Trade-off for reliability
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Failover scheduling VM1 App 1 App 2 ENI 1 App 2 IP1 IP1 VM2 App 1 ENI 1 IP1 VMn App 1 App 2 ENI 1 App 2 IP1 IP1 App 1 App 1 App 1 App 1 App 1 App 2 App 2 IP2, IP3 IP2, IP3, IP4 IP2, IP3 Trade-off for speed
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. On each host Change when create and attach ENIs is performed • Moved this to instance start time • No longer needed on-demand Need to burst allocate IP addresses • Opportunistically batch allocate at container launch time • Likely if one container was launched more are coming • Garbage collect unused later
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus API pattern ModifyNetworkInterfaceAttribute Total ModifyNetworkInterfaceAttribute Throttled AssignPrivateIpAddresses Total AssignPrivateIpAddresses Throttled
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results us-east-1 / prod containers started per minute } 7500 Launched in 5 minutes
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Netflix load balancing
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. IP based Application Load Balancing
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Configuring EC2 load balancers in Spinnaker
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus and Load Balancing integration User Containers Control Plane IP Target Group
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use cases on Titus • Netflix API, Node.js Backend UI Scripts • Machine Learning (GPUs) for personalization • Encoding and Content use cases • Netflix Studio use cases • CDN tracking and planning • Massively parallel CI system • Data Pipeline routing and SPaaS • Big Data platform use cases Batch Q4 15 Basic Services 1Q 16 Production Services 4Q 16 Customer Facing Services 2Q 17
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Q4 2018 container usage Common Jobs launched 255K jobs / day Different applications 1K+ different images Isolated Titus deployments 7 stacks Services Single app cluster size 5K (real), 12K containers (benchmark) Hosts managed 7K VMs (435,000 CPUs) Batch Containers launched 450K / day (750K / day peak) Hosts managed (autoscaled) 55K VMs / month
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Open Source Open sourced April 2018 Help other communities by sharing our approach
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Current and future work Advanced CPU Isolation Opportunistic Workloads Nitro and Bare Metal Instances Next Amazon and Netflix Partnership
  • 54. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Andrew Spyker @aspyker Joe Hsieh
  • 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.