SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dickson Yue, Solutions Architect
June 2nd, 2017
如何利用 Amazon EMR 及 Athena
打造高成本效益的大數據環境
議程
• 解構當前的大數據環境
• Amazon Athena, Amazon EMR應用
• 將組件遷移到Amazon Athena,技巧和竅門
• 將組件遷移到Amazon EMR,技巧和竅門
• 客戶實例
解構現有大數據環境
數據分析平台技術的發展
Data warehouse
appliances
1985 2006
Hadoop
clusters
2009
Decoupled EMR
clusters
2012
Cloud DWH
Redshift
Today
Clusterless
Athena Glue
Amazon SQS apps
Streaming
KCL
apps
Amazon Redshift
Amazon
Machine Learning
Presto
Amazon
EMR
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
FastSlowFast
SearchSQLNoSQLCacheFileMessageStream
Amazon EC2
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Reference architecture
LoggingIoTApplicationsTransportMessaging
ETL
BatchMessageInteractiveStreamML
Amazon EMR
AWS Lambda
Amazon Kinesis
Analytics
Amazon Athena
儲存
STORE
使用
CONSUME
過程/分析
PROCESS / ANALYZE
收集
COLLECT
Use case 使用案例 Redshift Amazon
Athena
Amazon
EMR
Other
互動 Interactive
需要秒
示例:自助式儀表板, Self-service dashboards
Redshift
Athena
+
S3
Presto
Spark
+
S3
Amazon
Elasticsearch
service
RDS
批量 Batch
需要幾分鐘到幾個小時
示例:每日/每週/每月報告
MapReduce
Hive Pig
Spark
Glue
串流 Stream
毫秒到秒
示例:欺詐警報 Fraud alerts ,1分鐘指標
Spark
streaming
Kinesis Analytics
KCL
Storm
Lambda
機器學習 Machine Learning
需要毫秒到幾分鐘
示例:欺詐檢測 Fraud detection, ,預測需求
Forecast demand
Spark ML
Amazon Machine
Learning
Deep learning
AMI
Slow
將工作遷移到 Amazon Athena
直接從Amazon S3作數據查詢
• 無需儲入數據
• 以原始格式查詢數據
• Athena支持多種數據格式
• Text,CSV,TSV,JSON,weblogs,AWS service logs
• 或者轉換為優化形式,如ORC或Parquet,以獲得最佳性能和最
低成本
• 不需要ETL
• 直接將數據流入Amazon S3
• 利用Amazon S3的耐用性和可用性
例子
例子
Ad-hoc access to raw data using SQL
例子
Ad-hoc access to data using Athena
Athena can query
aggregated datasets as well
技巧和竅門
按查詢付款 - 掃描$ 5 / TB
• 支付每個查詢掃描的數據量
• 節省成本的方法
• 壓縮
• 轉換為Columnar格式
• 使用 partitioning
• 免費:DDL查詢,失敗的查詢
Dataset Size on Amazon S3 Query Run time Data Scanned Cost
Logs stored as Text
files
1 TB 237 seconds 1.15TB $5.75
Logs stored in
Apache Parquet
format*
130 GB 5.13 seconds 2.69 GB $0.013
Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
轉換為ORC和PARQUET
• 您可以使用Hive CTAS轉換數據
• CREATE TABLE new_key_value_store
• STORED AS PARQUET
• As
• SELECT col_1,col2,col3 FROM noncolumartable
• SORT BY new_key,key_value_pair;
• 您也可以使用Spark將文件轉換為PARQUET / ORC
• 20行Pyspark代碼,將1TB的文本數據轉換為130 GB的PARQUET在EMR上運
行
• 快速轉換總成本$ 5
https://github.com/awslabs/aws-big-data-blog/tree/master/aws-blog-spark-parquet-conversion
如何定義你的 partitions
CREATE EXTERNAL TABLE Employee (
Id INT,
Name STRING,
Address STRING
) PARTITIONED BY (year INT)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ','
LOCATION
‘s3://mybucket/athena/inputdata/’;
CREATE EXTERNAL TABLE Employee (
Id INT,
Name STRING,
Address STRING,
INT Year
) PARTITIONED BY (year INT)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ','
LOCATION
‘s3://mybucket/athena/inputdata/’;
如何定義你的 partitions
s3://elasticmapreduce/impressions/
PRE dt=2009-04-12-13-00/
PRE dt=2009-04-12-13-05/
PRE dt=2009-04-12-13-10/
PRE dt=2009-04-12-13-15/
PRE dt=2009-04-12-13-20/
CREATE EXTERNAL TABLE impressions ( requestBeginTime string, ......)
PARTITIONED BY (dt string) LOCATION
's3://elasticmapreduce/samples/hive-ads/tables/impressions/' ;
PRE dt=2009-04-12-14-10/
MSCK REPAIR TABLE impressions
如何定義你的 partitions
s3://athena-examples/elb/plaintext/
elb/plaintext/2015/01/01/part-r-00000-ce65fca5-d6c6-40e6-b1f9-190cc4f93814.txt
elb/plaintext/2015/01/01/part-r-00001-ce65fca5-d6c6-40e6-b1f9-190cc4f93814.txt
elb/plaintext/2015/01/01_$folder$
elb/plaintext/2015/01/02/part-r-00006-ce65fca5-d6c6-40e6-b1f9-190cc4f93814.txt
elb/plaintext/2015/01/02/part-r-00007-ce65fca5-d6c6-40e6-b1f9-190cc4f93814.txt
elb/plaintext/2015/01/02/_$folder$
ALTER TABLE elb_logs_raw_native_part
ADD PARTITION (year='2015',month='01',day='01')
location 's3://athena-examples/elb/plaintext/2015/01/01/'
ALTER TABLE elb_logs_raw_native_part
ADD PARTITION (year='2015',month='01',day='02')
location 's3://athena-examples/elb/plaintext/2015/01/02/'
將工作遷移到 Amazon EMR
挑戰
在地Hadoop叢集
• 1U機組
• 通常為12內核,32/64 GB RAM和6
- 8 TB硬盤($ 3-4K)
• 不同的node角色
• HDFS使用在地磁盤,容量大小需
付合3x數據複製
• 網路交換器和機架
• 開放源碼版本或固定商業發行的許
可條款
Server rack 1
(20 nodes)
Server rack 2
(20 nodes)
Server rack N
(20 nodes)
Core
在同一個叢集上運行的工作類型
• Large Scale ETL: Apache Spark, Apache Hive with Apache Tez or
Apache Hadoop MapReduce
• Interactive Queries: Apache Impala, Spark SQL, Presto, Apache
Phoenix
• Machine Learning and Data Science: Spark ML, Apache Mahout
• NoSQL: Apache HBase
• Stream Processing: Apache Kafka, Spark Streaming, Apache Flink,
Apache NiFi, Apache Storm
• Search: Elasticsearch, Apache Solr
• Job Submission: Client Edge Node, Apache Oozie
• Data warehouses like Pivotal Greenplum or Teradata
生產線
Over utilized Under utilized
技巧和竅門
遷移的關鍵和TCO考慮
• DO NOT LIFT AND SHIFT
• 透過S3,張存儲和計算分開
• 解構工作負載並映射到開源工具
• 短暫的群集和自動縮放
• 選擇實例類型和EC2 Spot實例
分拆運算和存儲,使用S3去作為您的數據層
HDFS
S3 is designed for 11
9’s of durability and is
massively scalable
EC2 Instance
Memory
Amazon S3
Amazon EMR
Amazon EMR
Intermediates
stored on local
disk or HDFS
Local
在S3上運行Hbase作為可擴展NoSQL
S3提示:分區,壓縮和文件格式
• 避免按字典順序排列鍵盤名稱
• 提高吞吐量和S3列表性能
• 使用散列/隨機前綴或反轉日期時間
• 壓縮數據集,將帶寬從S3減小到EC2
• 確保使用可拆分壓縮或將每個文件作為集群上並行化的最
佳大小
• 像Parquet這樣的列狀文件格式可以提高讀取性能
多個Storage layer可供選擇
Amazon DynamoDB
Amazon RDS Amazon Kinesis
Amazon Redshift
Amazon S3
Amazon EMR
TCO – 短暫或長時間運行的集群
提交作業的選項
Amazon EMR
Step API
Submit a Spark
application
Amazon EMR
AWS Data Pipeline
Airflow, Luigi, or other
schedulers on EC2
Create a pipeline
to schedule job
submission or create
complex workflows
AWS Lambda
Use AWS Lambda to
submit applications to
EMR Step API or directly
to Spark on your cluster
Use Oozie on your
cluster to build
DAGs of jobs
集群界面可快速調整工作負載
管理應用程序
SQL editor, Workflow designer,
Metastore browser
Notebooks
設計和執行查詢和
工作負載
性能和硬件
• 短暫或長時間運行
• 實例類型
• 群集大小
• 應用程序設置
• 文件格式和S3調優
Master Node
r3.2xlarge
Slave Group - Core
c4.2xlarge
Slave Group – Task
m4.2xlarge (EC2 Spot)
注意事項
Spot for
task nodes
Up to 80%
off EC2
On-Demand
pricing
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
使用Spot和Reserved instance降低成本
以可預見的成本滿足SLA 以較低的成本超出SLA
使用 Advanced spot
Master Node Core Instance Fleet Task Instance Fleet
• 選擇提供Spot和On-Demand的instance type
• 根據容量/價格,從而在最優的可用區域啟動
• Spot Block支持
用Auto scale降低成本
客戶實例
DataXu – 180TB of Log Data per Day
CDN
Real Time
Bidding
Retargeting
Platform
Kinesis Attribution & ML
S3
Reporting
Data Visualization
Data
Pipeline
ETL(Spark SQL)
Ecosystem of tools and services
Amazon Athena
Petabytes of data generated
on-premises, brought to AWS,
and stored in S3
Thousands of analytical
queries performed on EMR
and Amazon Redshift.
Stringent security requirements
met by leveraging VPC, VPN,
encryption at-rest and in-
transit, CloudTrail, and
database auditing
Flexible
Interactive
Queries
Predefined
Queries
Surveillance
Analytics
Data Management
Data Movement
Data Registration
Version Management
Amazon S3
Web Applications
Analysts; Regulators
FINRA: Migrating from on-prem to AWS
FINRA saved 60% by moving to HBase on EMR
Lower Cost and Higher Scale than On-Premises
總結:跟據使用實例,選擇正確的工具
Storage
S3 (EMRFS), HDFS
YARN
Cluster Resource Management
Batch
MapReduce
Interactive
Tez
In Memory
Spark
Applications
Hive, Pig, Spark SQL/Streaming/ML, Flink, Mahout, Sqoop
HBase/Phoenix
Presto
Athena
Streaming
Flink
- 低延遲SQL - >Athena,Presto或Amazon Redshift
- 數據倉庫/報表 - > Spark或Hive或Glue或Amazon Redshift
- 管理和監控 - > EMR控制台或Ganglia指標
- HDFS - > S3
- 筆記本 - > Zeppelin筆記本或Jupyter(通過bootstrap動作)
- 查詢控制台 - >Athena或Redshift Spectrum色相
- 安全 - >Ranger(CF template)或HiveServer2或IAM角色
Glue
Amazon Redshift
Athena
壓縮
轉換為Columnar格式
使用 Partitioning
總結
Amazon EMR
DO NOT LIFT AND SHIFT
用 S3 張存儲和計算分開
短暫運行
Spot fleet instances
Autoscaling
謝謝
dyue@amazon.com
aws.amzon.com/emr
blogs.aws.amazon.com/bigdata

Contenu connexe

Tendances

Object Storage Overview
Object Storage OverviewObject Storage Overview
Object Storage OverviewCloudian
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
Presentation oracle super cluster t5-8 technical deep dive
Presentation   oracle super cluster t5-8 technical deep divePresentation   oracle super cluster t5-8 technical deep dive
Presentation oracle super cluster t5-8 technical deep divesolarisyougood
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Object Storage 1: The Fundamentals of Objects and Object Storage
Object Storage 1: The Fundamentals of Objects and Object StorageObject Storage 1: The Fundamentals of Objects and Object Storage
Object Storage 1: The Fundamentals of Objects and Object StorageHitachi Vantara
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWSDisaster Recovery, Continuity of Operations, Backup, and Archive on AWS
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWSAmazon Web Services
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 

Tendances (20)

Object Storage Overview
Object Storage OverviewObject Storage Overview
Object Storage Overview
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Presentation oracle super cluster t5-8 technical deep dive
Presentation   oracle super cluster t5-8 technical deep divePresentation   oracle super cluster t5-8 technical deep dive
Presentation oracle super cluster t5-8 technical deep dive
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
資料倉儲
資料倉儲資料倉儲
資料倉儲
 
Object Storage 1: The Fundamentals of Objects and Object Storage
Object Storage 1: The Fundamentals of Objects and Object StorageObject Storage 1: The Fundamentals of Objects and Object Storage
Object Storage 1: The Fundamentals of Objects and Object Storage
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWSDisaster Recovery, Continuity of Operations, Backup, and Archive on AWS
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 

En vedette

AWS Elastic Beanstalk運作微服務與Docker
AWS Elastic Beanstalk運作微服務與Docker AWS Elastic Beanstalk運作微服務與Docker
AWS Elastic Beanstalk運作微服務與Docker Amazon Web Services
 
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸Amazon Web Services
 
基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻Mason Mei
 
AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得Cliff Chao-kuan Lu
 
AwSome day 分享
AwSome day 分享AwSome day 分享
AwSome day 分享得翔 徐
 
零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture Overview零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture OverviewLeon Li
 
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)Amazon Web Services
 
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)Amazon Web Services
 
使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎Amazon Web Services
 
應用程式迅速開發與串連廣大用戶要素
應用程式迅速開發與串連廣大用戶要素應用程式迅速開發與串連廣大用戶要素
應用程式迅速開發與串連廣大用戶要素Amazon Web Services
 
管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付Amazon Web Services
 
Dev-Ops与Docker的最佳实践 QCon2016 北京站演讲
Dev-Ops与Docker的最佳实践 QCon2016 北京站演讲Dev-Ops与Docker的最佳实践 QCon2016 北京站演讲
Dev-Ops与Docker的最佳实践 QCon2016 北京站演讲ChinaNetCloud
 
以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端Amazon Web Services
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupRafal Kwasny
 
Building microservices in python @ pycon2017
Building microservices in python @ pycon2017Building microservices in python @ pycon2017
Building microservices in python @ pycon2017Jonas Cheng
 
無伺服器架構和Containers on AWS入門
無伺服器架構和Containers on AWS入門 無伺服器架構和Containers on AWS入門
無伺服器架構和Containers on AWS入門 Amazon Web Services
 
客戶導入雲端的經驗分享 [Panel Discussion]
客戶導入雲端的經驗分享 [Panel Discussion]客戶導入雲端的經驗分享 [Panel Discussion]
客戶導入雲端的經驗分享 [Panel Discussion]Amazon Web Services
 
Ovn vancouver
Ovn vancouverOvn vancouver
Ovn vancouverMason Mei
 

En vedette (20)

AWS Elastic Beanstalk運作微服務與Docker
AWS Elastic Beanstalk運作微服務與Docker AWS Elastic Beanstalk運作微服務與Docker
AWS Elastic Beanstalk運作微服務與Docker
 
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
 
基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻
 
AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得
 
AWS EC2 and ELB troubleshooting
AWS EC2 and ELB troubleshootingAWS EC2 and ELB troubleshooting
AWS EC2 and ELB troubleshooting
 
AwSome day 分享
AwSome day 分享AwSome day 分享
AwSome day 分享
 
零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture Overview零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture Overview
 
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
 
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
 
使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎
 
應用程式迅速開發與串連廣大用戶要素
應用程式迅速開發與串連廣大用戶要素應用程式迅速開發與串連廣大用戶要素
應用程式迅速開發與串連廣大用戶要素
 
管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付
 
Dev-Ops与Docker的最佳实践 QCon2016 北京站演讲
Dev-Ops与Docker的最佳实践 QCon2016 北京站演讲Dev-Ops与Docker的最佳实践 QCon2016 北京站演讲
Dev-Ops与Docker的最佳实践 QCon2016 北京站演讲
 
以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
 
深入探討雲端安全
深入探討雲端安全深入探討雲端安全
深入探討雲端安全
 
Building microservices in python @ pycon2017
Building microservices in python @ pycon2017Building microservices in python @ pycon2017
Building microservices in python @ pycon2017
 
無伺服器架構和Containers on AWS入門
無伺服器架構和Containers on AWS入門 無伺服器架構和Containers on AWS入門
無伺服器架構和Containers on AWS入門
 
客戶導入雲端的經驗分享 [Panel Discussion]
客戶導入雲端的經驗分享 [Panel Discussion]客戶導入雲端的經驗分享 [Panel Discussion]
客戶導入雲端的經驗分享 [Panel Discussion]
 
Ovn vancouver
Ovn vancouverOvn vancouver
Ovn vancouver
 

Similaire à 如何利用 Amazon EMR 及Athena 打造高成本效益的大數據環境

使用Amazon Machine Learning 創建智能應用程式
使用Amazon Machine Learning 創建智能應用程式使用Amazon Machine Learning 創建智能應用程式
使用Amazon Machine Learning 創建智能應用程式Amazon Web Services
 
Huangjing renren
Huangjing renrenHuangjing renren
Huangjing renrend0nn9n
 
使用NodeJS构建静态资源管理系统
使用NodeJS构建静态资源管理系统使用NodeJS构建静态资源管理系统
使用NodeJS构建静态资源管理系统Frank Xu
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具Amazon Web Services
 
Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3Guangya Liu
 
0527 asus cloud day 開放。引領數位內容進軍國際 – 華碩雲端市集
0527 asus cloud day 開放。引領數位內容進軍國際 – 華碩雲端市集0527 asus cloud day 開放。引領數位內容進軍國際 – 華碩雲端市集
0527 asus cloud day 開放。引領數位內容進軍國際 – 華碩雲端市集ASUSCloud
 
AWS_Educate_NTU_Rekognition_Analysis_S3_Image.pptx
AWS_Educate_NTU_Rekognition_Analysis_S3_Image.pptxAWS_Educate_NTU_Rekognition_Analysis_S3_Image.pptx
AWS_Educate_NTU_Rekognition_Analysis_S3_Image.pptx土撥 JIE
 
遷移數據到雲端的最佳策略
遷移數據到雲端的最佳策略遷移數據到雲端的最佳策略
遷移數據到雲端的最佳策略Amazon Web Services
 
Ria的强力后盾:rest+海量存储
Ria的强力后盾:rest+海量存储 Ria的强力后盾:rest+海量存储
Ria的强力后盾:rest+海量存储 zhen chen
 
Terracotta And Continuent Based Clustering Architecture
Terracotta And Continuent Based Clustering ArchitectureTerracotta And Continuent Based Clustering Architecture
Terracotta And Continuent Based Clustering ArchitectureTarget Source
 
新浪云计算公开课第一期:Let’s run @ sae(丛磊)
新浪云计算公开课第一期:Let’s run @ sae(丛磊)新浪云计算公开课第一期:Let’s run @ sae(丛磊)
新浪云计算公开课第一期:Let’s run @ sae(丛磊)锐 张
 
客戶常見問題分享與解決
客戶常見問題分享與解決客戶常見問題分享與解決
客戶常見問題分享與解決Amazon Web Services
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)Chengjen Lee
 
Elaster cap 云应用开发平台介绍
Elaster cap 云应用开发平台介绍Elaster cap 云应用开发平台介绍
Elaster cap 云应用开发平台介绍tcloudcomputing
 
IDC大会:新浪SAE架构与设计
IDC大会:新浪SAE架构与设计IDC大会:新浪SAE架构与设计
IDC大会:新浪SAE架构与设计Xi Zeng
 
Build 1 trillion warehouse based on carbon data
Build 1 trillion warehouse based on carbon dataBuild 1 trillion warehouse based on carbon data
Build 1 trillion warehouse based on carbon databoxu42
 
Track 2 Session 6_利用 Amazon Personalize 個人化推薦提升玩家體驗
Track 2 Session 6_利用 Amazon Personalize 個人化推薦提升玩家體驗Track 2 Session 6_利用 Amazon Personalize 個人化推薦提升玩家體驗
Track 2 Session 6_利用 Amazon Personalize 個人化推薦提升玩家體驗Amazon Web Services
 
twMVC 47_Elastic APM 的兩三事
twMVC 47_Elastic APM 的兩三事twMVC 47_Elastic APM 的兩三事
twMVC 47_Elastic APM 的兩三事twMVC
 

Similaire à 如何利用 Amazon EMR 及Athena 打造高成本效益的大數據環境 (20)

使用Amazon Machine Learning 創建智能應用程式
使用Amazon Machine Learning 創建智能應用程式使用Amazon Machine Learning 創建智能應用程式
使用Amazon Machine Learning 創建智能應用程式
 
Huangjing renren
Huangjing renrenHuangjing renren
Huangjing renren
 
使用NodeJS构建静态资源管理系统
使用NodeJS构建静态资源管理系统使用NodeJS构建静态资源管理系统
使用NodeJS构建静态资源管理系统
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具
 
Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3
 
0527 asus cloud day 開放。引領數位內容進軍國際 – 華碩雲端市集
0527 asus cloud day 開放。引領數位內容進軍國際 – 華碩雲端市集0527 asus cloud day 開放。引領數位內容進軍國際 – 華碩雲端市集
0527 asus cloud day 開放。引領數位內容進軍國際 – 華碩雲端市集
 
AWS_Educate_NTU_Rekognition_Analysis_S3_Image.pptx
AWS_Educate_NTU_Rekognition_Analysis_S3_Image.pptxAWS_Educate_NTU_Rekognition_Analysis_S3_Image.pptx
AWS_Educate_NTU_Rekognition_Analysis_S3_Image.pptx
 
遷移數據到雲端的最佳策略
遷移數據到雲端的最佳策略遷移數據到雲端的最佳策略
遷移數據到雲端的最佳策略
 
Ria的强力后盾:rest+海量存储
Ria的强力后盾:rest+海量存储 Ria的强力后盾:rest+海量存储
Ria的强力后盾:rest+海量存储
 
Terracotta And Continuent Based Clustering Architecture
Terracotta And Continuent Based Clustering ArchitectureTerracotta And Continuent Based Clustering Architecture
Terracotta And Continuent Based Clustering Architecture
 
Building IoT Backends
Building IoT BackendsBuilding IoT Backends
Building IoT Backends
 
新浪云计算公开课第一期:Let’s run @ sae(丛磊)
新浪云计算公开课第一期:Let’s run @ sae(丛磊)新浪云计算公开课第一期:Let’s run @ sae(丛磊)
新浪云计算公开课第一期:Let’s run @ sae(丛磊)
 
客戶常見問題分享與解決
客戶常見問題分享與解決客戶常見問題分享與解決
客戶常見問題分享與解決
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)
 
Elaster cap 云应用开发平台介绍
Elaster cap 云应用开发平台介绍Elaster cap 云应用开发平台介绍
Elaster cap 云应用开发平台介绍
 
AWS入門
AWS入門AWS入門
AWS入門
 
IDC大会:新浪SAE架构与设计
IDC大会:新浪SAE架构与设计IDC大会:新浪SAE架构与设计
IDC大会:新浪SAE架构与设计
 
Build 1 trillion warehouse based on carbon data
Build 1 trillion warehouse based on carbon dataBuild 1 trillion warehouse based on carbon data
Build 1 trillion warehouse based on carbon data
 
Track 2 Session 6_利用 Amazon Personalize 個人化推薦提升玩家體驗
Track 2 Session 6_利用 Amazon Personalize 個人化推薦提升玩家體驗Track 2 Session 6_利用 Amazon Personalize 個人化推薦提升玩家體驗
Track 2 Session 6_利用 Amazon Personalize 個人化推薦提升玩家體驗
 
twMVC 47_Elastic APM 的兩三事
twMVC 47_Elastic APM 的兩三事twMVC 47_Elastic APM 的兩三事
twMVC 47_Elastic APM 的兩三事
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

如何利用 Amazon EMR 及Athena 打造高成本效益的大數據環境