SlideShare a Scribd company logo
1 of 46
Autoscaling Spark for Fun and
Profit
Rafal Kwasny
11th Spark London Meetup
2015-11-26
1
Who am I
•DevOPS
•Build a few platforms in my life
•mostly adtech, in-game analytics for Sony
Playstation
•Currently advising Investment Banks
•CTO Entropy Investments
2
How do you run spark?
•Who runs on AWS?
•Who uses EMR?
3
So how to use autoscaling on AWS?
4
Overview
•typical architecture for AWS
•How autoscaling works
•Scripts to make your life easier
5
Typical architecture for AWS
6
Typical architecture for AWS
7
Generate some data
Typical architecture for AWS
8
Store it in S3
Typical architecture for AWS
9
or store it in a message queue
Typical architecture for AWS
10
Use your favourite tool for ETL
Typical architecture for AWS
11
Ship it back to S3
Typical architecture for AWS
12
Or send it somewhere
Typical architecture for AWS
13
- EMR
- spark-ec2
- build cluster from scratch
Map-reduce is about quickly writing very inefficient
code and then running it at massive scale
(C) Someone
14
Problem
•EC2 is a pay-for-what-you-use model
•You just have to decide how much resources
you want to use before starting a cluster
15
Problem
Most common problems while running on EC2
Scaling up
•My team needs a new cluster, how big it
should be?
Scaling down
•Did I shut down the DEV cluster before leaving
the office on Friday evening?
16
How to automate scaling?
17
Types of scaling
Vertical scaling -
„Let’s get a bigger box”
•Change instance type
•Change EBS parameters
18
Horizontal scaling -
„Just add more nodes”
Autoscaling
•Automatic resizing based on demand
•Define minimum/maximum instance count
•Define when scaling should occur
•Use metrics
•Run your jobs and don’t worry about
infrastructure
19
Architecture with autoscaling
20
Using RAM/local SSDs for caching
Only saving output into S3
Fault recovery
Autoscaling components
•AMI - machine image with installed spark
•Launch configuration - defines:
•AMI
•instance type
•instance storage
•public IP
•security groups
23
Autoscaling components
•Autoscaling group
•launch configuration
•availability zones
•VPC details
•min/max servers
•when to scale
•metrics/health checks
24
Putting it all together
Then you can run your job
25
Complicated?
•AWS provides a lot of services
26
spark-cloud
• Better scripts to start spark clusters on EC2
• Alpha version
• https://github.com/entropyltd/spark-cloud
27
What’s inside spark-cloud
Building AMI’s through packer
Packer is a tool for creating machine and
container images for multiple platforms from a
single source configuration.
Supports AWS, DigitalOcean, Docker,
OpenStack, Parallels, QEMU, VirtualBox,
VMware
38
Current functionality
•Start cluster
•Shutdown cluster
•But more to come :)
39
Spot instances
•Spot instances
40
Spot instances
–On-Demand:
$1.400
–Spot: $0.15
–89% cheaper
41
Summary
•Spark and EC2 is a very common combination
•Because it makes your life easier
•And cheaper
•spark-cloud script will help you
•You can just worry about writing good Spark
code!
42
Thank You
rafal@entropy.be
43
44
Amazon S3 Tips
•Don’t use s3n://
•Use s3a:// with hadoop 2.6
–Parallel rename, especially important for committing output
–Supports IAM authentication
–no „xyz_$folder$" files
–input seek
–multipart upload ( no 5GB limit )
–Error recovery and retry
More info https://issues.apache.org/jira/browse/HADOOP-10400
45
Why not EMR?
•Why pay for EMR? It costs more than a spot
instance
•vendor lock-in and proprietary libraries
•netlib-java
46

More Related Content

What's hot

(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best PracticesAmazon Web Services
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Web Services
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSAmazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...Amazon Web Services
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...Amazon Web Services
 
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...Amazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Spark Integration Architecture for restaurant data
Spark Integration Architecture for restaurant data�Spark Integration Architecture for restaurant data�
Spark Integration Architecture for restaurant dataDavid Tung
 
AWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedAWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedHarsha KM
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetupstevemcpherson
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 
SF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerSF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerChester Chen
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAmazon Web Services
 

What's hot (20)

(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
 
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Spark Integration Architecture for restaurant data
Spark Integration Architecture for restaurant data�Spark Integration Architecture for restaurant data�
Spark Integration Architecture for restaurant data
 
AWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedAWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explained
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetup
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
SF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerSF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher Berner
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
 

Viewers also liked

OpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformOpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformChinaNetCloud
 
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)Amazon Web Services
 
Aws summit devops 云端多环境自动化运维和部署
Aws summit devops   云端多环境自动化运维和部署Aws summit devops   云端多环境自动化运维和部署
Aws summit devops 云端多环境自动化运维和部署Leon Li
 
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸Amazon Web Services
 
AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得Cliff Chao-kuan Lu
 
基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻Mason Mei
 
AwSome day 分享
AwSome day 分享AwSome day 分享
AwSome day 分享得翔 徐
 
使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎Amazon Web Services
 
零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture Overview零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture OverviewLeon Li
 
Internet Cloud Operations - ChinaNetcloud & AWS Event Beijing
Internet Cloud Operations - ChinaNetcloud & AWS Event BeijingInternet Cloud Operations - ChinaNetcloud & AWS Event Beijing
Internet Cloud Operations - ChinaNetcloud & AWS Event BeijingChinaNetCloud
 
Aws容器服务详解
Aws容器服务详解Aws容器服务详解
Aws容器服务详解Leon Li
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界Amazon Web Services
 
淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用Rick Hwang
 
AWS Summit OaaS Talk by ChinaNetCloud
AWS Summit OaaS Talk by ChinaNetCloudAWS Summit OaaS Talk by ChinaNetCloud
AWS Summit OaaS Talk by ChinaNetCloudChinaNetCloud
 
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)Amazon Web Services
 
管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付Amazon Web Services
 
Amazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon Web Services
 
Automate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAutomate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAmazon Web Services
 
如何規劃與執行大型資料中心遷移和案例分享
如何規劃與執行大型資料中心遷移和案例分享如何規劃與執行大型資料中心遷移和案例分享
如何規劃與執行大型資料中心遷移和案例分享Amazon Web Services
 

Viewers also liked (20)

OpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformOpsStack--Integrated Operation Platform
OpsStack--Integrated Operation Platform
 
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
 
Aws summit devops 云端多环境自动化运维和部署
Aws summit devops   云端多环境自动化运维和部署Aws summit devops   云端多环境自动化运维和部署
Aws summit devops 云端多环境自动化运维和部署
 
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
 
AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得
 
基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻
 
AwSome day 分享
AwSome day 分享AwSome day 分享
AwSome day 分享
 
使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎
 
零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture Overview零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture Overview
 
Internet Cloud Operations - ChinaNetcloud & AWS Event Beijing
Internet Cloud Operations - ChinaNetcloud & AWS Event BeijingInternet Cloud Operations - ChinaNetcloud & AWS Event Beijing
Internet Cloud Operations - ChinaNetcloud & AWS Event Beijing
 
AWS EC2 and ELB troubleshooting
AWS EC2 and ELB troubleshootingAWS EC2 and ELB troubleshooting
AWS EC2 and ELB troubleshooting
 
Aws容器服务详解
Aws容器服务详解Aws容器服务详解
Aws容器服务详解
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界
 
淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用
 
AWS Summit OaaS Talk by ChinaNetCloud
AWS Summit OaaS Talk by ChinaNetCloudAWS Summit OaaS Talk by ChinaNetCloud
AWS Summit OaaS Talk by ChinaNetCloud
 
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
 
管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付
 
Amazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk Introduction
 
Automate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAutomate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeploy
 
如何規劃與執行大型資料中心遷移和案例分享
如何規劃與執行大型資料中心遷移和案例分享如何規劃與執行大型資料中心遷移和案例分享
如何規劃與執行大型資料中心遷移和案例分享
 

Similar to Autoscaling Spark on AWS EC2 - 11th Spark London meetup

(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation StudiosAmazon Web Services
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...DataWorks Summit
 
Leveraging elastic web scale computing with AWS
 Leveraging elastic web scale computing with AWS Leveraging elastic web scale computing with AWS
Leveraging elastic web scale computing with AWSShiva Narayanaswamy
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Amazon Web Services
 
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOC
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOCBusiness Agility: Taking an App Global (at Speed) - Session Sponsored by ITOC
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOCAmazon Web Services
 
Application Lifecycle Management on AWS
Application Lifecycle Management on AWSApplication Lifecycle Management on AWS
Application Lifecycle Management on AWSDavid Mat
 
Part 1 of the REAL Webinars on Oracle Cloud Native Application Development
Part 1 of the REAL Webinars on Oracle Cloud Native Application DevelopmentPart 1 of the REAL Webinars on Oracle Cloud Native Application Development
Part 1 of the REAL Webinars on Oracle Cloud Native Application DevelopmentLucas Jellema
 
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)Andrejs Prokopjevs
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanAmazon Web Services
 
AWS Lambda at JUST EAT
AWS Lambda at JUST EATAWS Lambda at JUST EAT
AWS Lambda at JUST EATAndrew Brown
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013aspyker
 
Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New UnicornWorkshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New UnicornAmazon Web Services
 
Sitecore 8.2 Update 1 on Azure Web Apps
Sitecore 8.2 Update 1 on Azure Web AppsSitecore 8.2 Update 1 on Azure Web Apps
Sitecore 8.2 Update 1 on Azure Web AppsRob Habraken
 
Wild Rides Takes off - The Dawn of a New Unicorn
Wild Rides Takes off - The Dawn of a New UnicornWild Rides Takes off - The Dawn of a New Unicorn
Wild Rides Takes off - The Dawn of a New UnicornAmazon Web Services
 
AWS APAC Webinar Week - Getting The Most From EC2
AWS APAC Webinar Week - Getting The Most From EC2AWS APAC Webinar Week - Getting The Most From EC2
AWS APAC Webinar Week - Getting The Most From EC2Amazon Web Services
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Outlyer
 
Containerize all the things!
Containerize all the things!Containerize all the things!
Containerize all the things!Mike Melusky
 

Similar to Autoscaling Spark on AWS EC2 - 11th Spark London meetup (20)

(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Leveraging elastic web scale computing with AWS
 Leveraging elastic web scale computing with AWS Leveraging elastic web scale computing with AWS
Leveraging elastic web scale computing with AWS
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOC
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOCBusiness Agility: Taking an App Global (at Speed) - Session Sponsored by ITOC
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOC
 
Application Lifecycle Management on AWS
Application Lifecycle Management on AWSApplication Lifecycle Management on AWS
Application Lifecycle Management on AWS
 
EMR Training
EMR TrainingEMR Training
EMR Training
 
Part 1 of the REAL Webinars on Oracle Cloud Native Application Development
Part 1 of the REAL Webinars on Oracle Cloud Native Application DevelopmentPart 1 of the REAL Webinars on Oracle Cloud Native Application Development
Part 1 of the REAL Webinars on Oracle Cloud Native Application Development
 
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
AWS Lambda at JUST EAT
AWS Lambda at JUST EATAWS Lambda at JUST EAT
AWS Lambda at JUST EAT
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
 
Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New UnicornWorkshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
 
Aws ec2
Aws ec2Aws ec2
Aws ec2
 
Sitecore 8.2 Update 1 on Azure Web Apps
Sitecore 8.2 Update 1 on Azure Web AppsSitecore 8.2 Update 1 on Azure Web Apps
Sitecore 8.2 Update 1 on Azure Web Apps
 
Wild Rides Takes off - The Dawn of a New Unicorn
Wild Rides Takes off - The Dawn of a New UnicornWild Rides Takes off - The Dawn of a New Unicorn
Wild Rides Takes off - The Dawn of a New Unicorn
 
AWS APAC Webinar Week - Getting The Most From EC2
AWS APAC Webinar Week - Getting The Most From EC2AWS APAC Webinar Week - Getting The Most From EC2
AWS APAC Webinar Week - Getting The Most From EC2
 
Self-Service Supercomputing
Self-Service SupercomputingSelf-Service Supercomputing
Self-Service Supercomputing
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda
 
Containerize all the things!
Containerize all the things!Containerize all the things!
Containerize all the things!
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Autoscaling Spark on AWS EC2 - 11th Spark London meetup

Editor's Notes

  1. How many of you use spark in production?
  2. Single source of data very good durability & availability Offloading storage complexity to AWS
  3. Parquet Columnar store Standard supported by Spark, Hive, Presto, Impala Optimised for: column based aggregations Not optimised for `select *` type queries INSERT/UPDATE’s
  4. When on EC2 you have 2 main options: spark-ec2 scripts EMR (Elastic Map-Reduce)
  5. no HDFS no state on workers