SlideShare une entreprise Scribd logo
1  sur  26
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless Data Lake Workshop
Raj Chilakapati
Solutions Architect
Amazon Web Services
A R C 3 0 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Development Environment Setup
Review Data Lake Architecture
Why Serverless?
Glue Extract Transform Load (ETL)
Data Governance
Bonus Content
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scenario
You support a successful online ecommerce website with millions of users. The
website is tracking your end user activity and their buying habits online.
Requirements
• Build a cost effective solution to have a unified analytics environment.
• Support the ability to query data in ad-hoc queries
• Use Business Intelligence tools with a end goal of helping business teams derive
efficiencies in their marketing campaigns.
Constraints to keep in mind
• Should not loose the focus on data quality and governance controls.
Data Sources include weblogs, NoSQL databases and other datasources
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Re:Invent Workshop summary
• Ingest data from various data sources and join them together
• Enrich raw data
• Convert data to parquet for efficient querying
• Grant access to roles based on the data classification
• SQL Access for Data Scientists
• Data Visualization with charts and graphs
• Data Lineage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. Your own device for console access
2. An AWS account that you are able to use for testing.
(Should not be used for production or other purposes.)
3. Download the Lab Guide at
http://bit.do/nov28nyloftworkshop
Requirements
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Development Environment
Your Cloud Engineering team has deployed a development environment for you
Ingestion / Data Generation
Kinesis / Log Data
Data Generation Lambda Functions
Amazon S3 Buckets
Amazon DynamoDB
AWS Glue Management Console / Development Endpoint
Amazon Athena
Amazon QuickSight
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. Deploy the Lab CloudFormation template from here
http://bit.do/nov28datalaketemplate
2. Examine the environment in CloudFormation Designer
3. Deploy your stack
Deploy the lab environment
Template
Stack
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High Level Architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Firehose
• Serverless, easy to use
• Seamless integration with AWS data stores
• Support for serverless transformation
• Near real-time ingestion
• Pay only for what you use
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Simple Storage Service (S3)
• Object Store
• Highly durable
• Limitless scalability
• Pay for what you use
• Comprehensive Security & Compliance capabilities
• Support for Query in place
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
• Serverless ETL
• Universal Data Catalog
• Open source Apache Spark environment
• DynamicFrame – Built in functions
• Seamless integration with AWS services
• Support for on-premises data stores
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena
• Serverless interactive query service
• Integrated with AWS Glue Data Catalog
• Open source, built on Presto, query with standard SQL
• Pay per Query
• Support for standard formats like CSV, JSON, ORC, Avro and Parquet
• Fast parallel query execution
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight
• Serverless, end to end BI solution
• Built-in SPICE engine
• Smart visualizations
• Seamless integration with AWS services
• On-premises database support
• Pay only for what you use
• Multiple device support
• Share and collaborate
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Classification and Security
• Grant S3 access by role to bucket / prefix
• Approaches to segment data
• Multiple copies of the data in different buckets
• Grant access to roles to buckets
• Tokenization, join to tokenized tables, and views to
resolve them
Bucket with
objects
Role Permissions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
UserProfile
Duplication
ID First Last
1 Sam Smith
2 Jane Jones
UserProfileSecure
ID First Last SSN
1 Sam Smith 111-11-1111
2 Jane Jones 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Duplication
UserProfile
ID First Last
1 Sam Smith
2 Jane Jones
UserProfileSecure
ID First Last SSN
1 Sam Smith 111-11-1111
2 Jane Jones 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tokenization
UserProfile
ID First Last SSN_Token
1 Sam Smith 8c9d409dcc43
2 Jane Jones 06a38ea94e69
SSN_Tokens
Token SSN
8c9d409dcc43 111-11-1111
06a38ea94e69 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tokenization
ProfileView
ID First Last
1 Sam Smith
2 Jane Jones
ProfileSecureView
ID First Last SSN
1 Sam Smith 111-11-1111
2 Jane Jones 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift Spectrum
UserProfileSecure
ID First Last SSN
1 Sam Smith 111-11-1111
2 Jane Jones 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bonus Content
• AWS Glue Development Endpoints – Apache Zeppelin notebook
• Amazon Redshift/Spectrum Integration
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Raj Chilakapati
chilakap@amazon.com

Contenu connexe

Tendances

How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018Amazon Web Services
 
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Amazon Web Services
 
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. General El...
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. General El...Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. General El...
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. General El...Amazon Web Services
 
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...Amazon Web Services
 
A Deep Dive into What's New with Amazon EMR (ANT340-R1) - AWS re:Invent 2018
A Deep Dive into What's New with Amazon EMR (ANT340-R1) - AWS re:Invent 2018A Deep Dive into What's New with Amazon EMR (ANT340-R1) - AWS re:Invent 2018
A Deep Dive into What's New with Amazon EMR (ANT340-R1) - AWS re:Invent 2018Amazon Web Services
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Amazon Web Services
 
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018Amazon Web Services
 
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...Amazon Web Services
 
DevOps Concepts for Data Science (DEV347-R2) - AWS re:Invent 2018
DevOps Concepts for Data Science (DEV347-R2) - AWS re:Invent 2018DevOps Concepts for Data Science (DEV347-R2) - AWS re:Invent 2018
DevOps Concepts for Data Science (DEV347-R2) - AWS re:Invent 2018Amazon Web Services
 
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018Amazon Web Services
 
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...Amazon Web Services
 
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
 SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right JobAmazon Web Services
 
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...Amazon Web Services
 
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...Amazon Web Services
 
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018Amazon Web Services
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Amazon Web Services
 
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Amazon Web Services
 
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Amazon Web Services
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Amazon Web Services
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Amazon Web Services
 

Tendances (20)

How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
 
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
 
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. General El...
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. General El...Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. General El...
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. General El...
 
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...
 
A Deep Dive into What's New with Amazon EMR (ANT340-R1) - AWS re:Invent 2018
A Deep Dive into What's New with Amazon EMR (ANT340-R1) - AWS re:Invent 2018A Deep Dive into What's New with Amazon EMR (ANT340-R1) - AWS re:Invent 2018
A Deep Dive into What's New with Amazon EMR (ANT340-R1) - AWS re:Invent 2018
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
 
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
 
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
 
DevOps Concepts for Data Science (DEV347-R2) - AWS re:Invent 2018
DevOps Concepts for Data Science (DEV347-R2) - AWS re:Invent 2018DevOps Concepts for Data Science (DEV347-R2) - AWS re:Invent 2018
DevOps Concepts for Data Science (DEV347-R2) - AWS re:Invent 2018
 
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
 
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
 
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
 SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
 
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
 
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
 
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
 
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
 
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
 

Similaire à Workshop: Architecting a Serverless Data Lake

Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdfCome scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdfAmazon Web Services
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczAmazon Web Services
 
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Amazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Amazon Web Services
 
Adding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAdding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAmazon Web Services
 
Using Search with a Database: Database Week SF
Using Search with a Database: Database Week SFUsing Search with a Database: Database Week SF
Using Search with a Database: Database Week SFAmazon Web Services
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Amazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Amazon Web Services
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of usersAmazon Web Services
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Amazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Amazon Web Services
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018Amazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Amazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Amazon Web Services
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
 

Similaire à Workshop: Architecting a Serverless Data Lake (20)

Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdfCome scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdf
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter Dachnowicz
 
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Adding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAdding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San Francisco
 
Using Search with a Database: Database Week SF
Using Search with a Database: Database Week SFUsing Search with a Database: Database Week SF
Using Search with a Database: Database Week SF
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018
 
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Workshop: Architecting a Serverless Data Lake

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless Data Lake Workshop Raj Chilakapati Solutions Architect Amazon Web Services A R C 3 0 2
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Development Environment Setup Review Data Lake Architecture Why Serverless? Glue Extract Transform Load (ETL) Data Governance Bonus Content
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scenario You support a successful online ecommerce website with millions of users. The website is tracking your end user activity and their buying habits online. Requirements • Build a cost effective solution to have a unified analytics environment. • Support the ability to query data in ad-hoc queries • Use Business Intelligence tools with a end goal of helping business teams derive efficiencies in their marketing campaigns. Constraints to keep in mind • Should not loose the focus on data quality and governance controls. Data Sources include weblogs, NoSQL databases and other datasources
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Re:Invent Workshop summary • Ingest data from various data sources and join them together • Enrich raw data • Convert data to parquet for efficient querying • Grant access to roles based on the data classification • SQL Access for Data Scientists • Data Visualization with charts and graphs • Data Lineage
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1. Your own device for console access 2. An AWS account that you are able to use for testing. (Should not be used for production or other purposes.) 3. Download the Lab Guide at http://bit.do/nov28nyloftworkshop Requirements
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Development Environment Your Cloud Engineering team has deployed a development environment for you Ingestion / Data Generation Kinesis / Log Data Data Generation Lambda Functions Amazon S3 Buckets Amazon DynamoDB AWS Glue Management Console / Development Endpoint Amazon Athena Amazon QuickSight
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1. Deploy the Lab CloudFormation template from here http://bit.do/nov28datalaketemplate 2. Examine the environment in CloudFormation Designer 3. Deploy your stack Deploy the lab environment Template Stack
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High Level Architecture
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Firehose • Serverless, easy to use • Seamless integration with AWS data stores • Support for serverless transformation • Near real-time ingestion • Pay only for what you use
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Simple Storage Service (S3) • Object Store • Highly durable • Limitless scalability • Pay for what you use • Comprehensive Security & Compliance capabilities • Support for Query in place
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue • Serverless ETL • Universal Data Catalog • Open source Apache Spark environment • DynamicFrame – Built in functions • Seamless integration with AWS services • Support for on-premises data stores
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena • Serverless interactive query service • Integrated with AWS Glue Data Catalog • Open source, built on Presto, query with standard SQL • Pay per Query • Support for standard formats like CSV, JSON, ORC, Avro and Parquet • Fast parallel query execution
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight • Serverless, end to end BI solution • Built-in SPICE engine • Smart visualizations • Seamless integration with AWS services • On-premises database support • Pay only for what you use • Multiple device support • Share and collaborate
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Classification and Security • Grant S3 access by role to bucket / prefix • Approaches to segment data • Multiple copies of the data in different buckets • Grant access to roles to buckets • Tokenization, join to tokenized tables, and views to resolve them Bucket with objects Role Permissions
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. UserProfile Duplication ID First Last 1 Sam Smith 2 Jane Jones UserProfileSecure ID First Last SSN 1 Sam Smith 111-11-1111 2 Jane Jones 222-22-2222
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Duplication UserProfile ID First Last 1 Sam Smith 2 Jane Jones UserProfileSecure ID First Last SSN 1 Sam Smith 111-11-1111 2 Jane Jones 222-22-2222
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tokenization UserProfile ID First Last SSN_Token 1 Sam Smith 8c9d409dcc43 2 Jane Jones 06a38ea94e69 SSN_Tokens Token SSN 8c9d409dcc43 111-11-1111 06a38ea94e69 222-22-2222
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tokenization ProfileView ID First Last 1 Sam Smith 2 Jane Jones ProfileSecureView ID First Last SSN 1 Sam Smith 111-11-1111 2 Jane Jones 222-22-2222
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift Spectrum UserProfileSecure ID First Last SSN 1 Sam Smith 111-11-1111 2 Jane Jones 222-22-2222
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bonus Content • AWS Glue Development Endpoints – Apache Zeppelin notebook • Amazon Redshift/Spectrum Integration
  • 26. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Raj Chilakapati chilakap@amazon.com