SlideShare une entreprise Scribd logo
1  sur  52
Advanced AWS Patterns from the
trenches of the enterprise
John Painter
Principal Consultant
john.painter@sourcedgroup.com
Brent Harrison
Consultant
brent.harrison@sourcedgroup.com
OUR TEAM TODAY
SYDNEY | TORONTO | VANCOUVER | KELOWNA
CONSULTING
Banks Aviation
Telecom
FinTech
Media
Healthcare
Smartphone
Manufacturer
Utilities
• Behind the firewall
• BYO AWS Account
• Guaranteed single tenancy
• Multi-cloud options
• Customer-controlled encryption
• Customer retains custody of all data
Over half a petabyte per year of Splunk
throughput under management
ENGINEERED SERVICES
PATTERN 1 – AUTO HEALING GEN2
AUTOMATED HEALING OF SINGLE INSTANCES WITH DEEP
HEALTH CHECKING
THE BASIC AUTO-HEALING PATTERN
1. Create ASG with min:max of 1:1
2. Elastic Load Balancer (ELB) provides
deep health checking
!= EC2 Auto Recovery
Top Tip: Deep Health Check
1. Script that checks multiple variables (eg:
process + disk space + memory) and
opens/closes a port via Netcat
2. Set the ELB to “port” type check
Auto Scaling Group
min: 1, max: 1
There are strong fiscal motivators to reduce tier-1 operational costs
via the use of automated healing actions
ASGS, ETH0, AND STATIC IPS
• LOTS of “cloud” applications want static
networks
• ASG instances receive dynamic IPs
• Add a secondary interface (Elastic
Network Interface – ENI) which maintains
a fixed network address
Auto Scaling Group
min: 1, max: 1
Other cluster
members/users
“Re-mappable” ENI
THE PREVIOUS APPROACH
1. Virtual Private Cloud (VPC) with large subnets (eg:>/24)
2. ASG with min:max of 1:1
3. Scripts call EC2 API on boot to “bring” a re-mappable interface to the instance
 Runs in the operating system, simple to pause apps for interface
Lots of upstream/deployment co-ordination required
Maintain support for multiple operating systems
Incompatible with the increasing number of AWS Marketplace offerings
Prone to failure if AWS API is under duress (which is also probably when you really want to
be healing!)
ONE ALTERNATIVE
Auto Scaling Group
min: 1, max: 1
Other cluster
members/users
ASG Notifications SQS/SNS Lambda
AWS CLI
 Slow
 Prone to AWS API/backplane duress
 Does not understand the state of the operating system
GOING BACK TO FIRST PRINCIPLES
• What hands out IP addresses in AWS?
–DHCP (VPC DHCP Options Group)
• Where does the range of IPs come from?
–Subnet size
• Can we reduce the number of IPs available from DHCP to 1?
–Provision an ENI in the subnet -> 1 less IP (for FREE!)
–Provision lots of ENIs in a subnet and there will only be 1 IP left
AUTO-HEAL GEN2
1. Create a Subnet for the Auto-Heal node (at the moment /28 is the smallest)
2. Create enough ENIs to remove all but 1 IP from DHCP
3. Create the normal ASG with min:max 1:1
4. Create the ELB with deep health checking as per normal
No scripts, co-ordination, or complexity inside the OS or the deployment framework
Fully compatible with the wide range of black-box AMIs from AWS Marketplace
✕Wastes address space (which may not be an issue depending on your network design
and integration points)
Same technique can be used for “fixed clusters”, sets of quorum
servers, container systems
PATTERN 2 - ADVANCED PROXY
A SCALABLE , HIGHLY AVAILABLE PROXY WITH ACTIVE DATA
CONTROLS AND STATIC IP RANGES
EC2 with Outbound Internet Access
MNAT
EIP
Public Subnet
Private Subnet
EC2EC2EC2
 Uncontrolled access to the internet
 Reactive techniques such as VPC flow
logs + Lambda are not capable of
running in real-time
* Diagram simplified for clarity, excludes multi availability zone elements
• Limitation of VPC: Routes can only
reference single interface
 Active control of traffic
 HTTP/S inspection
? Non-trivial engineering required
 Not truly HA
 Relatively low and finite throughput
 Prone to EC2 backplane saturation
 100s Mb/s per EIP
~HA Transparent Proxy Design
Whitelist
Blacklist
IP List
EIP
Public Subnet
Private Subnet
PROXY PROXY
EC2EC2EC2
* Diagram simplified for clarity, excludes multi availability zone elements
ENI
Public SubnetPublic Subnet
Availability Zone A
Auto Scaling Proxy
Availability Zone B
ASG
PROXYPROXY
 Active control of traffic
 Actively load balanced
 Truly HA
 ≈ Infinite bandwidth
 Variable public IPs
Private Subnet Private Subnet
EC2EC2EC2
EC2EC2EC2
“Auto Scaled” EIPs
Variable edge IPs are undesirable in the enterprise
Auto Scaling Proxies with Static IPs
 Active control of traffic
 Actively load balanced
 Static external IP addresses
? ≈ Infinite bandwidth requires co-
ordination
Private SubnetsPrivate Subnets
Availability Zone A Availability Zone B
ASG
EC2EC2EC2
EC2EC2EC2
MNAT
PROXY
PROXY
PROXY
PROXY
PROXY
PROXY
Public Subnets
MNAT
Public Subnets
EIP EIP
Why? .....and hang on, I still see EIPs?
Scaling Increments:
10GB @ $42/month/10GB
100’s of Mb/s @ ~$210/month/100’s Mb/s
• Provision 50/100/200Gb/s upfront
• If you move in increments of
+/-100Gb/s, see pattern 3.
• Simple, HA, static IP proxies for a relatively
low uplift in cost
Private SubnetsPrivate Subnets
Availability Zone A Availability Zone B
ASG
EC2EC2EC2
EC2EC2EC2
MNAT
PROXY
PROXY
PROXY
PROXY
PROXY
PROXY
Public Subnets
MNAT
Public Subnets
EIP EIP
Complex Inspection Sandwich
• Lots of vendor solutions can now support
healing
• Some even support scaling
• Few support ENI/EIP handling
Private SubnetsPrivate Subnets
Availability Zone A Availability Zone B
EC2EC2EC2
EC2EC2EC2
MNAT
INSPECTION SANDWICH
Public Subnets
MNAT
Public Subnets
EIP EIP
PATTERN 3 - AUTO SCALING ANYTHING
A TECHNIQUE LEVERAGING EXISTING SERVICES TO AUTOSCALE ALMOST
ANYTHING
The fiscal and operational benefits of Auto Scaling are well understood.
Auto Scaling is currently limited to scaling EC2 instances
We want to apply scaling to entire solutions, not just EC2
SCALING CLUSTERS? SCALING CELLS?
• Enterprises have many applications that cannot scale on compute alone
– Sharded databases
– Life Sciences Clusters
– Simulation Clusters
• Organisations are starting to adopt “Cell Architecture” to account for scale
• Auto Scaling  Auto Healing
Client Example
~8000 instances connected in “rings” of 20 nodes via a cluster protocol + ~1500 Cassandra
nodes. 50% variance in daily traffic volume. Ideal use-case for Auto Scale
THE GENERAL CASE - CELL / SHARD / CLUSTER
EC2
Node1
EC2
Node2
EC2
Node-n…
CloudFormation Stack
Health Check
THE GENERAL CASE - CELL / SHARD / CLUSTER
EC2
Node1
EC2
Node2
EC2
Node-n…
CloudFormation Stack
Health Check
EC2
Node1
EC2
Node2
EC2
Node-n…
CloudFormation Stack
Health Check
STEP 1 – INSTRUMENT THE SCALING METRIC
… …
CloudWatch Custom Metric
Number of Users CloudWatch Alarm
ScaleUp
CloudWatch Alarm
ScaleDown
OPTION 1 – USE LAMBDA
ScaleUp ScaleDown
… …
Number of Users
SNS
…
SNS
Build
Lambda
TeardownL
ambda
CloudFormation
WHY NOT LAMBDA?
• Duplication of AWS engineering investment
• Ongoing cost to maintain cadence with the growing features of Auto Scaling
– Scheduled Scaling
– Percentile Scaling
– Machine Learning Scaling / Predictive Scaling
• Lambda still needs a state machine
• We don’t have healing
There are strong fiscal and complexity
motivators to use native ASGs
STEP 2 – “SHADOW” ASG
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Shadow ASG
STEP 3 – ADD THE CFN LAMBDAS
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_INSTANCE_LAUNCH
Create Stack
EC2_INSTANCE_TERMINATE
Delete Stack
Shadow ASG
$5.76 per month per stack
(Unoptimized)
Auto Scale  Auto Heal
STEP 4 – HEALTH CHECK THE CLUSTERS
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_INSTANCE_LAUNCH
Create Stack
EC2_INSTANCE_TERMINATE
Delete Stack
Shadow ASG
HEALING SCENARIO 1 – CLUSTER FAILS
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_INSTANCE_LAUNCH
Create Stack
EC2_INSTANCE_TERMINATE
Delete Stack
Shadow ASG
HEALING SCENARIO 1 – SHADOW TERMINATED
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_INSTANCE_LAUNCH
Create Stack
EC2_INSTANCE_TERMINATE
Delete Stack
Shadow ASG
HEALING SCENARIO 1 – ASG IS IMPACTED
ScaleUp
ScaleDown
Number of Users
… …
Shadow Shadow
Shadow ASG
Desired: 3
Actual: 2
HEALING SCENARIO 1 – CLUSTER RESTORED
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_INSTANCE_LAUNCH
Create Stack
EC2_INSTANCE_TERMINATE
Delete Stack
Shadow ASG
Continuous Delivery for Clusters
Blue/Green Updates for Clusters at Huge Scale
CONTINUOUS DELIVERY FOR CLUSTERS
… … … … … …
• Using nothing but the ASG capacity, blue/green roll clusters of almost any size
• Increment ASG in V2.0, wait for health check, decrement ASG in V1.0
V1.0 V2.0
AUTO SCALE ANYTHING
• Solution works with many non-scaling AWS services
• CloudFormation can use Custom Resources to create almost anything
• The “Shadow” system only needs the scaling alarms from any CloudWatch metric and a
health check endpoint. Decoupled and does not interact with the system in any way.
Database Throughput ScaleUp Alarm
SNS
Lambda
RDS Read
Slave
CloudFormation
Shadow ASGRDS Read
Slave
RDS Read
Slave
ScaleDown Alarm
CUSTOM SCALING EXAMPLES
Number of user sign-ups/logins ScaleUp Alarm
SNS
Lambda
Application
Shard
CloudFormation
Shadow ASGApplication
Shard
Application
Shard
ScaleDown Alarm
CUSTOM SCALING EXAMPLES
CPU/Memory ScaleUp Alarm
SNS
Lambda
VMWare Node
CloudFormation
Shadow ASGVMWare Node VMWare Node
ScaleDown Alarm
CUSTOM SCALING EXAMPLES
CPU/Memory ScaleUp Alarm
SNS
Lambda
Other
infrastructure
platforms
CloudFormation
Shadow ASG
Other
infrastructure
platforms
Other
infrastructure
platforms
ScaleDown Alarm
CUSTOM SCALING EXAMPLES
Number of items in the queue ScaleUp Alarm
SNS
Lambda
Life Sciences
Application
CloudFormation
Shadow ASGLife Sciences
Application
Life Sciences
Application
ScaleDown Alarm
CUSTOM SCALING EXAMPLES
Number of planes currently in the air ScaleUp Alarm
SNS
Lambda
Flight Analysis
Stack
CloudFormation
Shadow ASGFlight Analysis
Stack
Flight Analysis
Stack
ScaleDown Alarm
CUSTOM SCALING EXAMPLES
Number of door entries ScaleUp Alarm
SNS
Lambda
Trading Stack
CloudFormation
Shadow ASGTrading Stack Trading Stack
ScaleDown Alarm
CUSTOM SCALING EXAMPLES
Order Volume ScaleUp Alarm
SNS
Lambda
Number of
robots on
station
CloudFormation
Shadow ASG
Number of
robots on
station
Number of
robots on
station
ScaleDown Alarm
CUSTOM SCALING EXAMPLES
Find Out MORE:
Visit Us: At our booth or online – www.sourcedgroup.com
Careers: www.sourcedgroup.com/careers
In the news:
• Computerworld (2016):
• Foreign Exchange Service OFX Embarks on Cloud Migration
• Connecting the Australian Channel (2015):
• Meet the Partner who took Qantas to the AWS Cloud
• The Australian Business Review (2015):
• Greater Buying Power lets Aussie bank on Adobe Experience Manager
Our Awards:
• AWS – Sydney Partners Summit - Invent & Simplify (2015)
• AWS – Global - Customer Obsessed Partner (2014)
Thank you!

Contenu connexe

Similaire à Advanced AWS techniques from the trenches of the Enterprise – Sourced Group

Similaire à Advanced AWS techniques from the trenches of the Enterprise – Sourced Group (20)

HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
 
IDI 2020 - Containers Meet Serverless
IDI 2020 - Containers Meet ServerlessIDI 2020 - Containers Meet Serverless
IDI 2020 - Containers Meet Serverless
 
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
 
Containers Meetup (AWS+CNCF) Milano Jan 15th 2020
Containers Meetup (AWS+CNCF) Milano Jan 15th 2020Containers Meetup (AWS+CNCF) Milano Jan 15th 2020
Containers Meetup (AWS+CNCF) Milano Jan 15th 2020
 
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECSWeaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
From One to Many: Evolving VPC Design (ARC401) | AWS re:Invent 2013
From One to Many:  Evolving VPC Design (ARC401) | AWS re:Invent 2013From One to Many:  Evolving VPC Design (ARC401) | AWS re:Invent 2013
From One to Many: Evolving VPC Design (ARC401) | AWS re:Invent 2013
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
 
IP Expo - What is AWS?
IP Expo - What is AWS?IP Expo - What is AWS?
IP Expo - What is AWS?
 
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
 
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
 
Cloud computing OpenStack_discussion_2014-05
Cloud computing OpenStack_discussion_2014-05Cloud computing OpenStack_discussion_2014-05
Cloud computing OpenStack_discussion_2014-05
 
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
 
A 60-minute tour of AWS Compute (November 2016)
A 60-minute tour of AWS Compute (November 2016)A 60-minute tour of AWS Compute (November 2016)
A 60-minute tour of AWS Compute (November 2016)
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Advanced AWS techniques from the trenches of the Enterprise – Sourced Group

  • 1. Advanced AWS Patterns from the trenches of the enterprise
  • 2. John Painter Principal Consultant john.painter@sourcedgroup.com Brent Harrison Consultant brent.harrison@sourcedgroup.com OUR TEAM TODAY SYDNEY | TORONTO | VANCOUVER | KELOWNA
  • 4. • Behind the firewall • BYO AWS Account • Guaranteed single tenancy • Multi-cloud options • Customer-controlled encryption • Customer retains custody of all data Over half a petabyte per year of Splunk throughput under management ENGINEERED SERVICES
  • 5. PATTERN 1 – AUTO HEALING GEN2 AUTOMATED HEALING OF SINGLE INSTANCES WITH DEEP HEALTH CHECKING
  • 6. THE BASIC AUTO-HEALING PATTERN 1. Create ASG with min:max of 1:1 2. Elastic Load Balancer (ELB) provides deep health checking != EC2 Auto Recovery Top Tip: Deep Health Check 1. Script that checks multiple variables (eg: process + disk space + memory) and opens/closes a port via Netcat 2. Set the ELB to “port” type check Auto Scaling Group min: 1, max: 1
  • 7. There are strong fiscal motivators to reduce tier-1 operational costs via the use of automated healing actions
  • 8. ASGS, ETH0, AND STATIC IPS • LOTS of “cloud” applications want static networks • ASG instances receive dynamic IPs • Add a secondary interface (Elastic Network Interface – ENI) which maintains a fixed network address Auto Scaling Group min: 1, max: 1 Other cluster members/users “Re-mappable” ENI
  • 9. THE PREVIOUS APPROACH 1. Virtual Private Cloud (VPC) with large subnets (eg:>/24) 2. ASG with min:max of 1:1 3. Scripts call EC2 API on boot to “bring” a re-mappable interface to the instance  Runs in the operating system, simple to pause apps for interface Lots of upstream/deployment co-ordination required Maintain support for multiple operating systems Incompatible with the increasing number of AWS Marketplace offerings Prone to failure if AWS API is under duress (which is also probably when you really want to be healing!)
  • 10. ONE ALTERNATIVE Auto Scaling Group min: 1, max: 1 Other cluster members/users ASG Notifications SQS/SNS Lambda AWS CLI  Slow  Prone to AWS API/backplane duress  Does not understand the state of the operating system
  • 11. GOING BACK TO FIRST PRINCIPLES • What hands out IP addresses in AWS? –DHCP (VPC DHCP Options Group) • Where does the range of IPs come from? –Subnet size • Can we reduce the number of IPs available from DHCP to 1? –Provision an ENI in the subnet -> 1 less IP (for FREE!) –Provision lots of ENIs in a subnet and there will only be 1 IP left
  • 12. AUTO-HEAL GEN2 1. Create a Subnet for the Auto-Heal node (at the moment /28 is the smallest) 2. Create enough ENIs to remove all but 1 IP from DHCP 3. Create the normal ASG with min:max 1:1 4. Create the ELB with deep health checking as per normal No scripts, co-ordination, or complexity inside the OS or the deployment framework Fully compatible with the wide range of black-box AMIs from AWS Marketplace ✕Wastes address space (which may not be an issue depending on your network design and integration points)
  • 13. Same technique can be used for “fixed clusters”, sets of quorum servers, container systems
  • 14. PATTERN 2 - ADVANCED PROXY A SCALABLE , HIGHLY AVAILABLE PROXY WITH ACTIVE DATA CONTROLS AND STATIC IP RANGES
  • 15. EC2 with Outbound Internet Access MNAT EIP Public Subnet Private Subnet EC2EC2EC2  Uncontrolled access to the internet  Reactive techniques such as VPC flow logs + Lambda are not capable of running in real-time * Diagram simplified for clarity, excludes multi availability zone elements
  • 16. • Limitation of VPC: Routes can only reference single interface  Active control of traffic  HTTP/S inspection ? Non-trivial engineering required  Not truly HA  Relatively low and finite throughput  Prone to EC2 backplane saturation  100s Mb/s per EIP ~HA Transparent Proxy Design Whitelist Blacklist IP List EIP Public Subnet Private Subnet PROXY PROXY EC2EC2EC2 * Diagram simplified for clarity, excludes multi availability zone elements ENI
  • 17. Public SubnetPublic Subnet Availability Zone A Auto Scaling Proxy Availability Zone B ASG PROXYPROXY  Active control of traffic  Actively load balanced  Truly HA  ≈ Infinite bandwidth  Variable public IPs Private Subnet Private Subnet EC2EC2EC2 EC2EC2EC2 “Auto Scaled” EIPs
  • 18. Variable edge IPs are undesirable in the enterprise
  • 19. Auto Scaling Proxies with Static IPs  Active control of traffic  Actively load balanced  Static external IP addresses ? ≈ Infinite bandwidth requires co- ordination Private SubnetsPrivate Subnets Availability Zone A Availability Zone B ASG EC2EC2EC2 EC2EC2EC2 MNAT PROXY PROXY PROXY PROXY PROXY PROXY Public Subnets MNAT Public Subnets EIP EIP
  • 20. Why? .....and hang on, I still see EIPs? Scaling Increments: 10GB @ $42/month/10GB 100’s of Mb/s @ ~$210/month/100’s Mb/s • Provision 50/100/200Gb/s upfront • If you move in increments of +/-100Gb/s, see pattern 3. • Simple, HA, static IP proxies for a relatively low uplift in cost Private SubnetsPrivate Subnets Availability Zone A Availability Zone B ASG EC2EC2EC2 EC2EC2EC2 MNAT PROXY PROXY PROXY PROXY PROXY PROXY Public Subnets MNAT Public Subnets EIP EIP
  • 21. Complex Inspection Sandwich • Lots of vendor solutions can now support healing • Some even support scaling • Few support ENI/EIP handling Private SubnetsPrivate Subnets Availability Zone A Availability Zone B EC2EC2EC2 EC2EC2EC2 MNAT INSPECTION SANDWICH Public Subnets MNAT Public Subnets EIP EIP
  • 22. PATTERN 3 - AUTO SCALING ANYTHING A TECHNIQUE LEVERAGING EXISTING SERVICES TO AUTOSCALE ALMOST ANYTHING
  • 23. The fiscal and operational benefits of Auto Scaling are well understood. Auto Scaling is currently limited to scaling EC2 instances We want to apply scaling to entire solutions, not just EC2
  • 24. SCALING CLUSTERS? SCALING CELLS? • Enterprises have many applications that cannot scale on compute alone – Sharded databases – Life Sciences Clusters – Simulation Clusters • Organisations are starting to adopt “Cell Architecture” to account for scale • Auto Scaling  Auto Healing Client Example ~8000 instances connected in “rings” of 20 nodes via a cluster protocol + ~1500 Cassandra nodes. 50% variance in daily traffic volume. Ideal use-case for Auto Scale
  • 25. THE GENERAL CASE - CELL / SHARD / CLUSTER EC2 Node1 EC2 Node2 EC2 Node-n… CloudFormation Stack Health Check
  • 26. THE GENERAL CASE - CELL / SHARD / CLUSTER EC2 Node1 EC2 Node2 EC2 Node-n… CloudFormation Stack Health Check EC2 Node1 EC2 Node2 EC2 Node-n… CloudFormation Stack Health Check
  • 27. STEP 1 – INSTRUMENT THE SCALING METRIC … … CloudWatch Custom Metric Number of Users CloudWatch Alarm ScaleUp CloudWatch Alarm ScaleDown
  • 28. OPTION 1 – USE LAMBDA ScaleUp ScaleDown … … Number of Users SNS … SNS Build Lambda TeardownL ambda CloudFormation
  • 29. WHY NOT LAMBDA? • Duplication of AWS engineering investment • Ongoing cost to maintain cadence with the growing features of Auto Scaling – Scheduled Scaling – Percentile Scaling – Machine Learning Scaling / Predictive Scaling • Lambda still needs a state machine • We don’t have healing
  • 30. There are strong fiscal and complexity motivators to use native ASGs
  • 31. STEP 2 – “SHADOW” ASG ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Shadow ASG
  • 32. STEP 3 – ADD THE CFN LAMBDAS ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  • 33. $5.76 per month per stack (Unoptimized)
  • 34. Auto Scale  Auto Heal
  • 35. STEP 4 – HEALTH CHECK THE CLUSTERS ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  • 36. HEALING SCENARIO 1 – CLUSTER FAILS ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  • 37. HEALING SCENARIO 1 – SHADOW TERMINATED ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  • 38. HEALING SCENARIO 1 – ASG IS IMPACTED ScaleUp ScaleDown Number of Users … … Shadow Shadow Shadow ASG Desired: 3 Actual: 2
  • 39. HEALING SCENARIO 1 – CLUSTER RESTORED ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  • 40. Continuous Delivery for Clusters Blue/Green Updates for Clusters at Huge Scale
  • 41. CONTINUOUS DELIVERY FOR CLUSTERS … … … … … … • Using nothing but the ASG capacity, blue/green roll clusters of almost any size • Increment ASG in V2.0, wait for health check, decrement ASG in V1.0 V1.0 V2.0
  • 42. AUTO SCALE ANYTHING • Solution works with many non-scaling AWS services • CloudFormation can use Custom Resources to create almost anything • The “Shadow” system only needs the scaling alarms from any CloudWatch metric and a health check endpoint. Decoupled and does not interact with the system in any way.
  • 43. Database Throughput ScaleUp Alarm SNS Lambda RDS Read Slave CloudFormation Shadow ASGRDS Read Slave RDS Read Slave ScaleDown Alarm CUSTOM SCALING EXAMPLES
  • 44. Number of user sign-ups/logins ScaleUp Alarm SNS Lambda Application Shard CloudFormation Shadow ASGApplication Shard Application Shard ScaleDown Alarm CUSTOM SCALING EXAMPLES
  • 45. CPU/Memory ScaleUp Alarm SNS Lambda VMWare Node CloudFormation Shadow ASGVMWare Node VMWare Node ScaleDown Alarm CUSTOM SCALING EXAMPLES
  • 46. CPU/Memory ScaleUp Alarm SNS Lambda Other infrastructure platforms CloudFormation Shadow ASG Other infrastructure platforms Other infrastructure platforms ScaleDown Alarm CUSTOM SCALING EXAMPLES
  • 47. Number of items in the queue ScaleUp Alarm SNS Lambda Life Sciences Application CloudFormation Shadow ASGLife Sciences Application Life Sciences Application ScaleDown Alarm CUSTOM SCALING EXAMPLES
  • 48. Number of planes currently in the air ScaleUp Alarm SNS Lambda Flight Analysis Stack CloudFormation Shadow ASGFlight Analysis Stack Flight Analysis Stack ScaleDown Alarm CUSTOM SCALING EXAMPLES
  • 49. Number of door entries ScaleUp Alarm SNS Lambda Trading Stack CloudFormation Shadow ASGTrading Stack Trading Stack ScaleDown Alarm CUSTOM SCALING EXAMPLES
  • 50. Order Volume ScaleUp Alarm SNS Lambda Number of robots on station CloudFormation Shadow ASG Number of robots on station Number of robots on station ScaleDown Alarm CUSTOM SCALING EXAMPLES
  • 51. Find Out MORE: Visit Us: At our booth or online – www.sourcedgroup.com Careers: www.sourcedgroup.com/careers In the news: • Computerworld (2016): • Foreign Exchange Service OFX Embarks on Cloud Migration • Connecting the Australian Channel (2015): • Meet the Partner who took Qantas to the AWS Cloud • The Australian Business Review (2015): • Greater Buying Power lets Aussie bank on Adobe Experience Manager Our Awards: • AWS – Sydney Partners Summit - Invent & Simplify (2015) • AWS – Global - Customer Obsessed Partner (2014)