Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lee Kear
Storage Specialist Solutions Architect
March 2017
Deep Dive on Amazon S3

Batches and Streams
AWS Direct
Connect
AWS Snowball,
Snowball Edge,
Snowmobile
3rd Party
Connectors
Transfer
Acceleration
AWS
Storage
Gateway
Amazon Kinesis
Firehose
File
Amazon EFS
Block
Amazon EBS
(persistent)
Object
Amazon GlacierAmazon S3 Amazon EC2
Instance Store
(ephemeral)

What to Expect from the Session
• Pick the right storage class for your use cases
• Automate management tasks
• Best practices to optimize S3 performance
• Tools to help you manage storage

AWS Direct Connect AWS Snowball ISV Connectors
Amazon Kinesis
Firehose
S3 Transfer
Acceleration
AWS Storage
Gateway
Data transfer into Amazon S3
AWS Snowmobile
AWS Snowball Edge

Amazon Storage Partner Solutions
aws.amazon.com/backup-recovery/partner-solutions/
Note: Represents a sample of storage partners
Backup and RecoveryPrimary Storage Archive
Solutions that leverage file, block, object,
and streamed data formats as an
extension to on-premises storage
Solutions that leverage Amazon S3 for
durable data backup
Solutions that leverage Amazon
Glacier for durable and cost-effective
long-term data backup

Choice of storage classes on S3
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier

Storage classes designed for your use case
S3 Standard
• Big data analysis
• Content distribution
• Static website
hosting
Standard - IA
• Backup & archive
• Disaster recovery
• File sync & share
• Long-retained data
Amazon Glacier
• Long term archives
• Digital preservation
• Magnetic tape
replacement

When should you move to Standard-IA?
S3 Analytics - storage class analysis
• Visualize the access pattern on your data over time
• Measure the object age where data is infrequently accessed
• Dive deep by bucket, prefixes, or specific object tag
• Easily create a lifecycle policy based on the analysis

Visualize access pattern on your data

Export S3 Analytics to the tools of your choice

 Pick the right storage class for your use cases
 Automate management tasks
• Best practices to optimize S3 performance

Automate data management
Lifecycle policies
• Automatic tiering and cost controls
• Includes two possible actions:
• Transition: archives to Standard - IA or Amazon
Glacier based on object age you specified
• Expiration: deletes objects after specified time
• Actions can be combined
• Set policies by bucket, prefix, or tags
• Set policies for current version or non-
current versions
Lifecycle policies

Set up a lifecycle policy on the AWS Management Console

Protect your data from accidental deletes
• Protects from unintended user deletes or
application logic failures
• New version with every upload
• Easy retrieval of deleted objects and roll
back to previous versions
Best Practice
Versioning

Easily recover from unintended delete
Tip: Create a recycle bin for your storage
Best Practice

Automate with trigger-based workflow
Amazon S3 event notifications
Events
SNS topic
SQS
queue
Lambda
function
• Notification when objects are
created via PUT, POST, Copy,
Multipart Upload, or DELETE
• Filter on prefixes and suffixes
• Trigger workflow with Amazon
SNS, Amazon SQS, and AWS
Lambda functions

Cross-region replication
Automated, fast, and reliable asynchronous replication of data across AWS regions
Use cases:
• Compliance - store data hundreds of miles apart
• Lower latency - distribute data to regional customers
• Security - create remote replicas managed by separate AWS accounts
How it works:
• Only replicates new PUTs. Once configured, all new uploads into source
bucket will be replicated
• Entire bucket or prefix based
• 1:1 replication between any 2 regions
• Versioning required
• Deletes and lifecycle actions are not replicated

Summary – automate management tasks
Cross-region
replication
Automate transition
and expiration with
lifecycle policies
Trigger-based
workflow with
event notification
Easily recover from
accidental delete
with versioning

Topics
 Automate management tasks
 Best practices to optimize S3 performance

Faster upload over long distances
S3 Transfer Acceleration
S3 Bucket
AWS Edge
Location
Uploader
Optimized
Throughput!
Change your endpoint, not your code
No firewall changes or client software
Longer distance, larger files, more benefit
Faster or free
68 global edge locations
Try it at S3speedtest.com

Faster upload of large objects
Parallelize PUTs with multipart uploads
• Increase aggregate throughput by
parallelizing PUTs on high-bandwidth
networks
• Move the bottleneck to the network,
where it belongs
• Increase resiliency to network errors;
fewer large restarts on error-prone
networks
Best Practice

Faster download
You can parallelize GETs as well as PUTs
GET /example-object HTTP/1.1
Host: example-bucket.s3.amazonaws.com
x-amz-date: Fri, 28 Jan 2016 21:32:02 GMT
Range: bytes=0-9
Authorization: AWS AKIAIOSFODNN7EXAMPLE:Yxg83MZaEgh3OZ3l0rLo5RTX11o=
For large objects, use range-based GETs
align your get ranges with your parts
For content distribution, enable Amazon CloudFront
• Caches objects at the edge
• Low latency data transfer to end user

SQL Query on S3
Amazon Athena
• No loading of data
• Serverless
• Support text, CSV, TSV, JSON, AVRO, and columnar
formats such as Apache ORC and Apache Parquet
• Access via Console or JDBC driver
• $5 per TB scanned from S3

Getting Started – Athena with Console

Query your S3 data using SQL
Run time
and data
scanned

<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-164533126.jpg
<my_bucket>/2013_11_13-164533127.jpg
<my_bucket>/2013_11_13-164533128.jpg
<my_bucket>/2013_11_12-164533129.jpg
<my_bucket>/2013_11_12-164533130.jpg
<my_bucket>/2013_11_12-164533131.jpg
<my_bucket>/2013_11_12-164533132.jpg
<my_bucket>/2013_11_11-164533133.jpg
Use a key-naming scheme with randomness at the beginning for high
TPS
• Most important if you regularly exceed 100 TPS on a bucket
• Avoid starting with a date or monotonically increasing numbers
Don’t do this…
Higher TPS by distributing key names

Distributing key names
Add randomness to the beginning of the key name
with a hash or reversed timestamp (ssmmhhddmmyy)
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg

Let S3 do the list for you
S3 Inventory
Save Time
Daily or Weekly
Delivery
Deliver to S3 Bucket
CSV
Flat File Output
Half the price of LIST API at $0.0025 per million objects listed

Best Practices - performance
 Faster upload over long distances
with S3 Transfer Acceleration
 Faster upload for large objects
with S3 multipart upload
 Optimize GET performance with
Range GET and CloudFront
 SQL Query on S3 with Athena
 Distribute key name for high TPS
workload
 Optimize list with S3 inventory

Topics
 Best practices to optimize S3 performance
 Tools to help you manage storage

Organize your data with object tags
Manage data based on what it is as opposed to where its located
• Classify your data, up to 10 tags per object
• Tag your objects with key-value pairs
• Write policies once based on the type of data
• Put object with tag or add tag to existing objects
Storage Metrics
& Analytics
Lifecycle PolicyAccess Control

Manage access with object tags
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*"
"Condition": {"StringEquals": {"s3:RequestObjectTag/Project": "X"}}
}
]
}
User permission by tags

Use cases:
• Perform security analysis
• Meet your IT auditing and compliance needs
• Take immediate action on activity
How it works:
• Capture S3 object-level requests
• Enable at the bucket level
• Logs delivered to your S3 bucket
• $0.10 per 100,000 data events
Audit and monitor access
AWS CloudTrail data events

Monitor performance and operation
Amazon CloudWatch metrics for S3
• Generate metrics for data of your choice
• Entire bucket, prefixes, and tags
• Up to 1,000 groups per bucket
• 1-minute CloudWatch metrics
• Alert and alarm on metrics
• $0.30 per metric per month

CloudWatch Metrics for S3
Metric Name value
AllRequests Count
PutRequests Count
GetRequests Count
ListRequests Count
DeleteRequests Count
HeadRequests Count
PostRequests Count
Metric Name value
BytesDownloaded MB
BytesUploaded MB
4xxErrors Count
5xxErrors Count
FirstByteLatency ms
TotalRequestLatency ms

Summary – manage your storage
 Classify storage and manage access with S3 object tags
 Audit and monitor access with CloudTrail
 Monitor operational performance and set alarm with S3
CloudWatch metrics

Recap
 Best practices to optimize S3 performance
 Tools to help you manage storage

Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Similaire à Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks