Soumettre la recherche
Mettre en ligne
Modern Data Platform on AWS
•
3 j'aime
•
3,061 vues
Amazon Web Services
Suivre
Modern Data Platform on AWS
Lire moins
Lire la suite
Affichage du diaporama
Signaler
Partager
Affichage du diaporama
Signaler
Partager
1 sur 53
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Data Lake Overview
Data Lake Overview
James Serra
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management Basics
amorshed
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
Recommandé
Data Lake Overview
Data Lake Overview
James Serra
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management Basics
amorshed
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
Modern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
DATAVERSITY
Enterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
Lars E Martinsson
Building a modern data warehouse
Building a modern data warehouse
James Serra
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
DATAVERSITY
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
Amazon Web Services
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
javier ramirez
Contenu connexe
Tendances
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
Modern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
DATAVERSITY
Enterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
Lars E Martinsson
Building a modern data warehouse
Building a modern data warehouse
James Serra
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
DATAVERSITY
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
Tendances
(20)
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
Modern Data architecture Design
Modern Data architecture Design
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
Enterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
Building a modern data warehouse
Building a modern data warehouse
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Similaire à Modern Data Platform on AWS
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
Amazon Web Services
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
javier ramirez
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
Amazon Web Services
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Summits
Discuss data migration with AWS experts - STG304 - Santa Clara AWS Summit
Discuss data migration with AWS experts - STG304 - Santa Clara AWS Summit
Amazon Web Services
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Amazon Web Services
Building a modern data platform in AWS
Building a modern data platform in AWS
Amazon Web Services
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Amazon Web Services
Stream processing and managing real-time data
Stream processing and managing real-time data
Amazon Web Services
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
Amazon Web Services
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
Amazon Web Services
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Amazon Web Services
Make Your Data Move: Best Practices for Migrating Data to AWS
Make Your Data Move: Best Practices for Migrating Data to AWS
Amazon Web Services
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Amazon Web Services
Migrating Data to the Cloud: Explore Your Options From AWS
Migrating Data to the Cloud: Explore Your Options From AWS
Amazon Web Services
Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)
Amazon Web Services
Make your data move: Best practices for migrating data to AWS - STG201 - New ...
Make your data move: Best practices for migrating data to AWS - STG201 - New ...
Amazon Web Services
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
Amazon Web Services
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Amazon Web Services
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Amazon Web Services
Similaire à Modern Data Platform on AWS
(20)
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
Discuss data migration with AWS experts - STG304 - Santa Clara AWS Summit
Discuss data migration with AWS experts - STG304 - Santa Clara AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building a modern data platform in AWS
Building a modern data platform in AWS
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Stream processing and managing real-time data
Stream processing and managing real-time data
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Make Your Data Move: Best Practices for Migrating Data to AWS
Make Your Data Move: Best Practices for Migrating Data to AWS
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Migrating Data to the Cloud: Explore Your Options From AWS
Migrating Data to the Cloud: Explore Your Options From AWS
Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)
Make your data move: Best practices for migrating data to AWS - STG201 - New ...
Make your data move: Best practices for migrating data to AWS - STG201 - New ...
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Plus de Amazon Web Services
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
Open banking as a service
Open banking as a service
Amazon Web Services
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
Computer Vision con AWS
Computer Vision con AWS
Amazon Web Services
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
Tools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
How to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
Building a web application without servers
Building a web application without servers
Amazon Web Services
Fundraising Essentials
Fundraising Essentials
Amazon Web Services
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
Plus de Amazon Web Services
(20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Open banking as a service
Open banking as a service
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Computer Vision con AWS
Computer Vision con AWS
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Tools for building your MVP on AWS
Tools for building your MVP on AWS
How to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Building a web application without servers
Building a web application without servers
Fundraising Essentials
Fundraising Essentials
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Modern Data Platform on AWS
1.
S U M
M I T Ams t e rd a m
2.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Modern Data Platform on AWS Damon Cortesi Big Data Architect - AWS @dacort A N T 0 0 1 David Morel Takeaway.com
3.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T A brief history of significant Big Data releases 2004 Google publishes MapReduce paper 2006 Hadoop is created HBase development starts 2008 Facebook launches Hive AWS EMR announced 2009 Facebook launches Presto Apache Spark released 2012 MXNet Paper Published 2015 Amazon Athena & AWS Glue announced 2016
4.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data every 5 years There is more data than people think 15 years live for Data platforms need to 1,000x scale >10x grows
5.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T There are more people accessing data And more requirements for making data available Data Scientists Analysts Business Users Applications Secure Real time Flexible Scalable
6.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS databases and analytics Broad and deep portfolio, built for builders AWS Marketplace Amazon Redshift Data warehousing Amazon EMR Hadoop + Spark Athena Interactive analytics Kinesis Analytics Real-time Amazon Elasticsearch service Operational Analytics RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server Aurora MySQL, PostgreSQL Amazon QuickSight Amazon SageMaker DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database S3/Amazon Glacier AWS Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect Data Movement AnalyticsDatabases Business Intelligence & Machine Learning Data Lake Managed Blockchain Blockchain Templates Blockchain Amazon Comprehend Amazon Rekognition Amazon Lex Amazon Transcribe AWS DeepLens 250+ solutions 730+ Database solutions 600+ Analytics solutions 25+ Blockchain solutions 20+ Data lake solutions 30+ solutions RDS on VMWare
7.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
8.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data lake with AWS Glue Amazon S3 (Raw data) Amazon S3 (Staging data) Amazon S3 (Processed data) AWS Glue Data Catalog Crawlers Crawlers Crawlers
9.
S U M
M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
10.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon S3—Object Storage Security and Compliance Three different forms of encryption; encrypts data in transit when replicating across regions; log and monitor with CloudTrail, use ML to discover and protect sensitive data with Macie Flexible Management Classify, report, and visualize data usage trends; objects can be tagged to see storage consumption, cost, and security; build lifecycle policies to automate tiering, and retention Durability, Availability & Scalability Built for eleven nine’s of durability; data distributed across 3 physical facilities in an AWS region; automatically replicated to any other AWS region Query in Place Run analytics & ML on data lake without data movement; S3 Select can retrieve subset of data, improving analytics performance by 400%
11.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Movement From Real-time Sources Amazon Kinesis Video Streams Securely stream video from connected devices to AWS for analytics, machine learning (ML), and other processing Amazon Kinesis Data Firehose Capture, transform, and load data streams into AWS data stores for near real-time analytics with existing business intelligence tools. Amazon Kinesis Data Streams Build custom, real-time applications that process data streams using popular stream processing frameworks AWS IoT Core Supports billions of devices and trillions of messages, and can process and route those messages to AWS endpoints and to other devices reliably and securely Managed Streaming For Kafka Fully managed open- source platform for building real-time streaming data pipelines and applications.
12.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Kinesis Data Streams
13.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Kinesis Data Firehose
14.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prefix: raw/life/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/ Buffer: Up to 128MB or 15 minutes Kinesis events to S3 Kinesis Data Streams Kinesis Data Firehose Save as Parquet Lambda Transformation Aggregated JSON Data Clients Aggregated Parquet Data Source backup New! as of 12th Feb • Support for custom S3 prefix Amazon Athena Crawlers
15.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Movement From On-premises Datacenters AWS Snowball, Snowball Edge and Snowmobile Petabyte and Exabyte- scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS cloud AWS Direct Connect Establish a dedicated network connection from your premises to AWS; reduces your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections AWS Storage Gateway Lets your on-premises applications to use AWS for storage; includes a highly-optimized data transfer mechanism, bandwidth management, along with local cache AWS Database Migration Service Migrate database from the most widely-used commercial and open- source offerings to AWS quickly and securely with minimal downtime to applications
16.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Database Migration Service
17.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T DMS to S3 AWS Database Migration Service Source database Crawlers Data catalogSnapshot Data AWS Glue Amazon Athena Amazon EMR New! as of 25th March • Support for Parquet • Support for S3 encryption with KMS Amazon Redshift
18.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T DMS to S3 Change Data Capture (CDC) • Challenging to do easily • Need to maintain a staging table and reconstitute dataset newDf = df2.filter("cdc = 'I'") updDf = df2.filter("cdc = 'U'") delDf = df2.filter("cdc = 'D'”) w = Window().partitionBy("id").orderBy(F.col("idx").desc()) latestUpdateDf = updDf.withColumn("rn", F.row_number() .over(w)).where(F.col("rn") == 1).select("*").drop("rn") # Create the update table, join to the original table, # filter everything out of the original where the update is null, then union tempDf = latestUpdateDf.select("id").withColumnRenamed("id", "id_1") filteredBaseDf = insertsDf.join(tempDf, insertsDf.id == tempDf.id_1, 'left') filteredBaseDf = filteredBaseDf.filter("id_1 is null").drop("id_1") insertAndUpdateDdf = filteredBaseDf.union(latestUpdateDf) # Ok, now remove any deleted columns! tempDf = delDf.select("id").withColumnRenamed("id", "id_del") finalDf = insertAndUpdateDdf.join(tempDf, insertAndUpdateDdf.id == tempDf.id_del, 'left') finalDf = finalDf.filter("id_del is NOT null").drop("id_del")
19.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Glue ETL New!
20.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Third-party API to S3 3rd Party API AWS Glue Python Shell Crawlers Data catalogIncremental Exports Amazon Athena Glue ETL Transformed Data Amazon Redshift
21.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Parquet File Format Row group meta data allows Parquet reader to skip portions of, or all files. Columnar format is optimized for analytics. Column meta-data allows for pre- aggregation
22.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Parquet • Previously it was common to deliver in JSON/CSV/text then run another process to convert to Parquet. It’s becoming more common to deliver straight to Parquet. • Kinesis Firehose – Added support May 2018 • Custom prefix support !: Feb 2019 • Requires schema in Glue Data Catalog • Athena – CREATE TABLE AS SELECT: Oct 2018 • EMR – S3-optimized Parquet committer: Nov 2018 • Database Migration Service – Added Parquet support ": Mar 2019
23.
S U M
M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
24.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Glue ETL New!
25.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EMR
26.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Redshift
27.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Athena Permissions Data Lake AWS Cloud AWS Cloud Reporting & Analytics Machine Learning AWS Cloud Custom Applications AWS Glue Data Catalog
28.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EMR Notebooks in the Console A managed analytics environment based on Jupyter Notebooks Amazon EMR clusters AWS Management Console for EMR EMR-managed notebook based on Jupyter notebook users Auto saves notebook file to your S3 bucket Run queries on your remote EMR cluster EMR VPC Customer VPC
29.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon QuickSight
30.
S U M
M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
31.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data lake with AWS Glue Amazon S3 (Raw data) Amazon S3 (Staging data) Amazon S3 (Processed data) AWS Glue Data Catalog Crawlers Crawlers Crawlers
32.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
33.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How it works
34.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Easily load data to your data lake logs DBs Blueprints Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep one-shot incremental
35.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Blueprints build on AWS Glue
36.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Easily de-duplicate your data with ML transforms
37.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Secure once, access in multiple ways Data Lake Storage Data Catalog Access Control Lake Formation Admin
38.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
39.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Lake Formation Pricing No additional charges – Only pay for the underlying services used.
40.
S U M
M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
41.
A tale of
AWS at Takeaway.com Data Engineering in the Business Intelligence team
42.
1. Once upon
a time
43.
2. Learning
44.
3. The kingdom
45.
46.
4. Lessons
47.
5. Complexity
48.
6. Flexibility
49.
7. Simplicity
50.
8. Expansion
51.
9. Happily ever
after
52.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
53.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved.S U M M I TS U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Télécharger maintenant