2. AWS offers a modern data platform
BI +
A NA LYT I C S
OLTP ERP CRM
DW SILO 1
BUSINESS
INTELLIGENCE
DEVICES WEB
LOGS
MOBILE
APPS
DW SILO 2
LOB
APPS
BUSINESS
INTELLIGENCE
to
MA C H I NE
LE A R NI NG
DA T A
WA R E H O US I NG
Data lakes
OPEN FORMATS
CENTRAL
CATALOG
(CSV, ORC, Parquet, Avro)
Data silos
Old guard data patterns Modern data architecture
8. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
9. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
10. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
11. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
ML &
Analytics
SageMaker AI ServicesElasticsearch Athena
12. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
ML &
Analytics
SageMaker AI ServicesElasticsearch Athena
Kinesis
Data Analytics LambdaCloudWatch ElasticsearchEMR -Spark
Stream
Flink on Kinesis
Data Analytics
Stream Analysis
13. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
ML &
Analytics
SageMaker AI ServicesElasticsearch Athena
Kinesis
Data Analytics LambdaCloudWatch ElasticsearchEMR -Spark
Stream
Flink on Kinesis
Data Analytics
Stream Analysis
Kinesis
Data Firehose
14. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
ML &
Analytics
SageMaker AI ServicesElasticsearch Athena
Kinesis
Data Analytics LambdaCloudWatch ElasticsearchEMR -Spark
Stream
Flink on Kinesis
Data Analytics
Stream Analysis
Event Capture Event Handler
Kinesis
Data Analytics
Lambda
Event Scoring
SageMaker
Event Action
AI Services
Step
Functions
Kinesis
Data Firehose
15. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
ML &
Analytics
SageMaker AI ServicesElasticsearch Athena
Kinesis
Data Analytics LambdaCloudWatch ElasticsearchEMR -Spark
Stream
Flink on Kinesis
Data Analytics
Stream Analysis
Event Capture Event Handler
Kinesis
Data Analytics
Lambda
Event Scoring
SageMaker
Event Action
AI Services
Step
Functions
Kinesis
Data Firehose
Data Warehouse
Database
Elasticsearch
DynamoDB
Aurora
Amazon Redshift
ElastiCache
QuickSight
BI Reporting
Analytics
kibana
Near-Zero Latency
DocumentDB
Jupyter
16. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
ML &
Analytics
SageMaker AI ServicesElasticsearch Athena
Kinesis
Data Analytics LambdaCloudWatch ElasticsearchEMR -Spark
Stream
Flink on Kinesis
Data Analytics
Stream Analysis
Event Capture Event Handler
Kinesis
Data Analytics
Lambda
Event Scoring
SageMaker
Event Action
AI Services
Step
Functions
Kinesis
Data Firehose
Data Warehouse
Database
Elasticsearch
DynamoDB
Aurora
Amazon Redshift
ElastiCache
QuickSight
BI Reporting
Analytics
kibana
Near-Zero Latency
DocumentDB
Jupyter
Athena
Federated Query
New Preview
17. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
ML &
Analytics
SageMaker AI ServicesElasticsearch Athena
Kinesis
Data Analytics LambdaCloudWatch ElasticsearchEMR -Spark
Stream
Flink on Kinesis
Data Analytics
Stream Analysis
Event Capture Event Handler
Kinesis
Data Analytics
Lambda
Event Scoring
SageMaker
Event Action
AI Services
Step
Functions
Kinesis
Data Firehose
Data Warehouse
Database
Elasticsearch
DynamoDB
Aurora
Amazon Redshift
ElastiCache
QuickSight
BI Reporting
Analytics
kibana
Near-Zero Latency
DocumentDB
Jupyter
Athena
Federated Query
New Preview
Fargate
EKS
ECS
API Gateway
Lambda
18. Speed (Real-time)
Ingest ServingSource Scale (Batch)
Transactions
Web logs
cookies
ERP
Connected
devices
Social media
GPS Location
Mobile
Internet
AWS Direct Connect
VPN
API Gateway
SFTP
AWS DMS
Storage Gateway
AppSync
Amazon MQ
Kinesis
Kafka (MSK)
s3
Raw
s3
Curated
s3
Analytics
EMR
Glue
Data lake
1011010
0011110010110
0000101
Lake Formation
s3
Digested
EMR
Glue
EMR
Glue
ML &
Analytics
SageMaker AI ServicesElasticsearch Athena
Kinesis
Data Analytics LambdaCloudWatch ElasticsearchEMR -Spark
Stream
Flink on Kinesis
Data Analytics
Stream Analysis
Event Capture Event Handler
Kinesis
Data Analytics
Lambda
Event Scoring
SageMaker
Event Action
AI Services
Step
Functions
Kinesis
Data Firehose
Data Warehouse
Database
Elasticsearch
DynamoDB
Aurora
Amazon Redshift
ElastiCache
QuickSight
BI Reporting
Analytics
kibana
Near-Zero Latency
DocumentDB
Jupyter
Athena
Federated Query
New Preview
Fargate
EKS
ECS
API Gateway
Lambda
Data analysts
Data scientists
Business users
Engagement
platforms
Automation /
events
19. Amazon S3 is the foundation of any data lake
Multiple data
input sources
Supports many
unique users and
teams
Storage scales on
demand
Analyzed by
many applications
20. Amazon S3 as the foundation for data lakes
Durable, available, exabyte-scalable
Secure, compliant, auditable
High performance
Low-cost storage and analytics
Broad network integration
Amazon S3
AWS Lake Formation
& AWS Glue
AWS
Snowball
Amazon Kinesis
Data Streams
AWS
Snowmobile
Amazon
Kinesis
Data Firehose
Amazon
Redshift
Amazon
EMR
Amazon
Athena
Amazon Kinesis
Amazon
Elasticsearch
Service
Amazon
SageMaker
Amazon
Comprehend
Amazon
Rekognition
21. AWS Lake Formation
Build a secure data lake in days
Simplify security
management
Centrally define security, governance,
and auditing policies
Enforce policies consistently
across multiple services
Integrates with IAM and KMS
Provide self-service
access to data
Build a data catalog that
describes your data
Enable analysts and data scientists
to easily find relevant data
Analyze with multiple analytics
services without moving data
Build data lakes
quickly
Move, store, catalog,
and clean your data faster
Transform to open formats like
Parquet and ORC
ML-based deduplication
and record matching
22. Single Source of Truth for Raw Data
Use Least Transformations
Use Lifecycle policies to S3-IA or GlacierAmazon S3
Tier 1 Data Lake: Raw or Ingestion
23. Non-structed to structed Raw Data
Annotation
Data cleansing and transform
Uniform the data of encoding, format, types
(suchastimeformat,stringencoding,andetc)
Amazon S3
Tier 2 Data Lake: Curated
24. Use columnar formats – Parquet/ORC
Organized into Partitions
Coalescing to Larger Partitions over time
Optimized for Analytics
Amazon S3
Tier 3 Data Lake: Analytics
25. Domain Level DataMart
Organized by use cases
Optimized for Specialized Analysis
Amazon S3
Tier 4 Data Lake: Digested
(Serving Stage)
26. Amazon Redshift
Amazon Redshift: What’s Under the Hood?
Amazon Redshift
Seamless Data Lake Integration
Amazon Redshift is a fully managed data
warehouse service that extends
seamlessly to the data lake. It’s highly
performant, scalable, resilient, easy-to-
use, cost-effective, & secure.
27. Our portfolio
Broadanddeepportfolio,purpose-builtforbuilders
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Movement
Data Lake
Business Intelligence & Machine Learning
Data Exchange
Data exchange
NEW
QuickSight
Visualizations
SageMaker
ML
Comprehend
NLP
Transcribe
Speech-to-text
Textract
Extract text
Personalize
Recommendation
Forecast
Forecasts
Translate
Translation
CodeGuru
Code reviews
Kendra
Enterprise search
NEW NEW
Analytics Databases
Managed
Blockchain
Blockchain
Templates
Blockchain
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
NEW
NEW
NEWAQUA EMR on Outposts
UltraWarm
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server,
RDS on VMware
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
Managed Apache
Cassandra Service
Wide column
NEW
DocumentDB
Document
NEW
NEW
RDS Proxy
RDS on Outposts
28. Broad database and analytics services portfolio
Relational
databases
Non-relational
databases
Data
warehouses
Hadoop
and Spark
Amazon
Redshift
Amazon
EMR
Operational
analytics
Amazon ES
Amazon
Aurora
Amazon
DynamoDB
Business
intelligence
Amazon
QuickSight
Amazon
RDS
Amazon
DocumentDB
Amazon
ElastiCache
Real-time
analytics
Amazon MSK
PostgreSQL
Logstash
Elasticsearch
Kibana
33. Learn storage with AWS Training and Certification
45+ free digital courses cover topics related to cloud
storage, including:
Resources created by the experts at AWS to help you build cloud storage skills
Classroom offerings, such as Architecting on AWS, feature
AWS expert instructors and hands-on activities
• Amazon S3
• AWS Storage Gateway
• Amazon S3 Glacier
• Amazon Elastic File System
(Amazon EFS)
• Amazon Elastic Block Store
(Amazon EBS)
Visit the storage learning path at https://aws.training/storage
Notes de l'éditeur
One of the most common challenges afflicting legacy data architectures is that data that is collected, but proves difficult to extract value from. For example, data is difficult to access, costly to refine or analyze, not catalogued, in proprietary formats or platforms, etc.
Traditional architectures & on-prem data warehousing pose many challenges
Can’t scale easily/on-demand; long lead times for hardware procurement & upgrades
High overhead costs for administration
Proprietary formats & silo’d data make it costly & complex to access, refine, & join data from different sources
Data not catalogued and/or of unreliable quality
Cold and warm data inseparable – bloated costs & wasted capacity
Anti-democratization – limits on how many users & how much data can be accommodated
Inspire other legacy architecture patterns – e.g. retrofitting use cases to accommodate the wrong tools for the job, instead of simply using the right tool for each use case
These challenges lead to dark data – data that is collected but challenging to extract insights. Dark data is no longer tenable. Amazon Redshift + Data Lake solution helps turn dark data into free data
Before you start your set up, you want to think about where you are storing all your data.
1/ You want to store all the structured and unstructured data in a single place and
2/ ensure that you can immediately start pushing data in from different systems.
3/ And at the same time, when you discover new use cases or your business expands to newer domains, you can plug in more applications that can start analyzing that data without a need to re-think your architecture.
4/ Finally, you want to build a data lake that scales? That stands the test of time? Because you won’t know today all the use cases you’ll want to use your data lake for. It is difficult to know exactly which data sets are important and how they should be cleaned, enriched, and transformed to solve different business problems.
All of this and more is exactly why you should be choosing s3 as the foundation of your data lake storage.
S3: ubiquitous storage allows you to centralize datasets. It’s SIMPLE, and it has consistent behavior and predictability in operations
The native features of S3 are exactly what you want from a Data Lake
11 9's of durability, HA, and scalable
Best security, compliance, and audit capabilities. Object-level controls
Massively parallel and scalable
Cost-effective storage classes
Broad ecosystem: catalog, ingest, and gathering insights
1/ Build data lakes quickly: With Lake Formation, you can move, store, catalog, and clean your data faster. You simply point Lake Formation at your data sources, and Lake Formation crawls those sources and moves the data into your new Amazon S3 data lake. Lake Formation also changes data into open formats like Apache Parquet and ORC for faster analytics.
2/ Enforce security policies across multiple services: In addition, you can use Lake Formation to centrally define security, governance, and auditing policies in one place, and then enforce those policies for your users across multiple services that access data stored in the data lake. This reduces the effort in configuring policies across services and provides consistent enforcement and compliance.
3/ Provide self-service access to data:Lake Formation helps you build a data catalog that describes the different data sets that are available along with which groups of users have access to each. This makes your users more productive by helping them find the right data set to analyze. By providing a central catalog of your data, LakeFormation makes it easier for your analysts and data scientists to find and access the data they need.
TRANSITION: In this new age of massive data needs and capabilities, people want to consume data differently, too. The cloud has enabled such large datasets and cost-effective computing/analytics that customers are hungry to have an easy way to find the big, useful datasets, incorporate them into their data lakes and analytics, but it's hard today.
-------------------------------------BACKGROUND------------------------------------
AWS Lake Formation automates many of the steps required to set up a data lake, allowing customers to get started with just a few clicks from a single, unified dashboard.
1/ Move, store, catalog, and clean your data faster: To get started you add connection information for the data stores you want to move data from, or point Lake Formation to data that has been moved by Kinesis, or identify data from an AWS database, and then Lake Formation will crawl those sources to identify the layout of the data. Then you train Lake Formation with ML to clean and prepare the data. To start training Lake Formation, you provide examples of what you would like your data to look like after it’s been cleaned, for example, you can train Lake Formation to dedupe locations in a commercial insurance database. This training process can be as quick as 15 minutes. Inside Amazon, this same technology is used to de-duplicate and match data records for things like movies, products, and points of interest. Then the cleaned data is written to your new Amazon S3 data lake.
2/ Enforce security policies across multiple services: From a single screen, you can set up permissions for specified users, and those permissions are implemented across security services like AWS Identity and Access Management and AWS Key Management Service, storage services like Amazon S3, and analytics services like Amazon Redshift, Amazon Athena, and Amazon EMR. Lake Formation enforces access permissions and policies by only allowing users with the right credentials to decrypt the data in the data lake. The only way you can access the data in your data lake is by authenticating to Lake Formation with a username and password or using single sign-on. If you have permission to access the data, Lake Formation will give you a temporary key that you use to decrypt and analyze the data. Lake Formation keeps your data lake secure, reduces the hassle in re-defining policies across multiple services, and provides consistent enforcement and compliance of those policies.
3/ Gain and manage new insights: As Lake Formation adds your data, it builds a catalog, based on the data layout, that describes the content of your data lake. You can then add text labels with more detail to better describe specific datasets. Using the catalog, users can more easily search and find the data they need for their analysis based on the details you’ve added. It seems like such a simple thing, but it makes a tremendous difference in productivity when your analysts can find the right data for the analysis they are trying to perform. Of course, like everything else in Lake Formation, this catalog also enforces your security rules consistently by only showing users the data they are allowed to see. Once you’ve found the right data, Lake Formation makes it easier for your analysts and data scientists to securely extract data for analysis using tools like Athena, Redshift, EMR, Sagemaker, and QuickSight across diverse data sets. (How to add a label: Select a data source within your data lake, then click “Edit metadata” and begin adding descriptors to further classify the data to make it easier for users to find and use the data they need from the data lake.)
Lake Formation Use Cases:
• Amgen is the world's largest independent biotechnology company. “At Amgen we've been heavy users of Amazon Redshift and Amazon EMR clusters for over three years. Setting up security and access controls for each AWS account, service, user, and data set at the level of detail that was required could be cumbersome,” said KerbyJohnson, Enterprise Data Lake Product Owner, Amgen. “AWS Lake Formation streamlines the process with a central point of control while also enabling us to manage who is using our data, and how, with more detail. AWS Lake Formation allows us to manage permissions on Amazon S3 objects like we would manage permissions on data in a database. Our users will be able to find, access, and analyze the data they need with the tools they prefer. This new workflow can make everyone more productive when using Amgen’s data.”
• Life360 is the world's leading peace of mind service for families. The Life360 app brings families closer with smart features designed to protect and connect the people who matter most. “We wanted to use AWS Lake Formation to build our data lake for supporting location-based time-series data, and make it much easier to load data. The pre-fabricated blueprints helped get data into the data lake without our data engineering team having to write code from scratch, so they could focus on operationalizing ingest, not reinventing the wheel,” said Richard Chennault, Head of Cloud and Data Services, Life360, Inc. “With AWS Lake Formation we were able to quickly unlock data available in Amazon S3 and make it available to analyze across a broad spectrum of AWS data services. The data remains in place in Amazon S3, we can analyze it in many different ways, and we maintain full control over it.”
• Accenture is a leading global professional services company, providing a broad range of services and solutions in strategy, consulting, digital, technology, and operations. “I focus on helping clients in their ‘Data on Cloud’ journey. Specific to that, we have seen that organizations are dealing with a lack of trusted data when they need to perform analytics on data coming from multiple sources,” said Namrata Maheshwary, Senior Architect for the Data Business Group, Accenture. “Data cleansing is a critical step in data analytics and can greatly impact the business outcome and decision making. The new features in AWS Lake Formation have been hugely beneficial to address the challenge of data veracity and securing access to the data lake. We found it tremendously useful to make use of the advanced machine learning techniques for data preparation to find matching records, clean, and deduplicate data from different data sources. This will help reduce the time, effort, and cost, while improving the quality and accuracy of the data in a customer’s data lakes.”
Other Top Brands Using Lake Formation: Fender, Change Healthcare, Panasonic, Zalando, Change Healthcare, Cloudreach, Alcon, Quantiphi
http://aws.amazon.com/redshift
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
Data transfer for >350TB/month is 0.05/GB = $50/TB – so if you’re pulling down 10TB/day, you’re looking at $600/year. A 10TB cluster would cost $10k/year = 6%. $1060/TB/Year
https://en.wikipedia.org/wiki/Data_curation
http://aws.amazon.com/redshift
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
Data transfer for >350TB/month is 0.05/GB = $50/TB – so if you’re pulling down 10TB/day, you’re looking at $600/year. A 10TB cluster would cost $10k/year = 6%. $1060/TB/Year
http://aws.amazon.com/redshift
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
Data transfer for >350TB/month is 0.05/GB = $50/TB – so if you’re pulling down 10TB/day, you’re looking at $600/year. A 10TB cluster would cost $10k/year = 6%. $1060/TB/Year
Battle-hardened database, re-architected into a cloud-first MPP data warehouse with resilient columnar storage and robust OLAP functionality
Amazon Redshift started out as a PostGres fork, but we completely rewrote the storage engine to be columnar, we made it an OLAP relational data store by adding analytics functions such as window operations, and we also made it an Massively Parallel Processing system so that it scales significantly large.
We have preserved compatibility with PostGres, which is why you could actually use a PostGres driver to connect to Redshift, but it is important to note that Redshift is an OLAP relational database – not an OLTP relational database like PostGres.
We then leveraged and integrated Amazon Redshift with other AWS services in the AWS ecosystem such as VPC’s, KMS and IAM for security, S3 for data lake integration and backups, EC2s for its cluster implementation, and CloudWatch for monitoring
All of this together makes up the service that we know as Amazon Redshift
AWS offers the broadest set of databases and analytics services for customers to lift and shift their database and analytics workloads to the cloud. And customers are doing this at record levels across many different areas:
1/ relational databases – For customers wanting to move away from self-managing Oracle, SQL Server, MySQL, PostgreSQL, and MariaDB databases, AWS offers Amazon RDS and Amazon Aurora.
2/ non-relational databases – For customers wanting to move away from self-managed non-relational document- and key-value stores such as MongoDB, Redis, and Memcached, AWS offers DynamoDB, DocumentDB and ElastiCache.
3/ Data Warehouses – customers want to move from their expensive, proprietary Teradata, Oracle and SQL Server Data Warehouses to Amazon Redshift.
4/ Hadoop and Spark – customers want to move from their Hadoop and Spark deployments on-premises to EMR for cost savings and having a managed service.
5/ operational analytics – customers want to move from their elasticsearch, logstash, and kibana (ELK) on-premises to Elasticsearch Service for cost savings and having a managed service.
6/ real-time analytics – customers want to move from their Apache Kafka deployments to Amazon Managed Streaming for Kafka.
AWS analytics services are complemented by a number of third-party software vendors, supplementing our in-house services with solutions around data collection and preparation, governance, and business intelligence/visualization.
If customers want independent help in choosing and implementing analytics solutions, AWS has a wide range of global and specialized competency partners to assist.
If you’re ready to continue learning, we offer 45+ free digital courses around storage, including Backup and Restore with AWS (90 minutes) and Migrating and Tiering Storage to AWS (1 hour).
You can also take Architecting on AWS classroom training to get hands on practice and learn directly from an instructor.
Visit the storage learning path for to learn how to get started learning about storage.