It’s been an exciting year for Amazon Aurora, the MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features, include high availability options and new integrations with AWS services. We’ll also discuss the recently-announced Aurora with PostgreSQL compatibility.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
What’s New in Amazon Aurora for MySQL and PostgreSQL
1. What’s New in
Amazon Aurora
Amazon Web Services
Kevin Jernigan – Senior Product Manager
March 16, 2017
2. Agenda
What is Aurora?
Review of Aurora performance
New performance enhancements
Review of Aurora availability
New availability enhancements
Other recent and upcoming feature enhancements
Aurora with PostgreSQL compatibility
3. Open source compatible relational database
Performance and availability of
commercial databases
Simplicity and cost-effectiveness of
open source databases
What is Amazon Aurora?
4. A bit of history …
Re-imagining relational databases for the cloud era
5. Relational databases were not designed for the cloud
Multiple layers of
functionality all in a
monolithic stack
SQL
Transactions
Caching
Logging
6. Not much has changed in last 20 years
Even when you scale it out, you’re still replicating the same stack
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Application
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Application
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Storage
Application
8. Scale-out, distributed, multi-tenant architecture
Master Replica Replica Replica
Availability
Zone 1
Shared storage volume
Availability
Zone 2
Availability
Zone 3
Storage nodes with SSDs
Purpose-built log-structured distributed
storage system designed for databases
Storage volume is striped across
hundreds of storage nodes distributed
over 3 different availability zones
Six copies of data, two copies in each
availability zone to protect against
AZ+1 failures
Plan to apply same principles to other
layers of the stack
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
9. Leveraging cloud ecosystem
Lambda
S3
IAM
CloudWatch
Invoke Lambda events from stored procedures/triggers.
Load data from S3, store snapshots and backups in S3.
Use IAM roles to manage database access control.
Upload systems metrics and audit logs to CloudWatch.
10. Amazon RDS: Automate administrative tasks
Schema design
Query construction
Query optimization
Automatic fail-over
Backup & recovery
Isolation & security
Industry compliance
Push-button scaling
Automated patching
Advanced monitoring
Routine maintenance
Takes care of your time-consuming database management tasks,
freeing you to focus on your applications and business
You
AWS
11. Meet Amazon Aurora ……
The relational database reimagined for the cloud
Speed and availability of high-end commercial databases
Simplicity and cost-effectiveness of open source databases
Drop-in compatibility with MySQL (and soon PostgreSQL)
Simple pay as you go pricing
Delivered as a managed service
13. WRITE PERFORMANCE READ PERFORMANCE
Scaling with instance sizes
Aurora scales with instance size for both read and write
Aurora MySQL 5.6 MySQL 5.7
14. Real-life data – gaming workload
Aurora vs. RDS MySQL – r3.4XL, MAZ
Aurora 3X faster on r3.4xlarge
15. Do fewer IOs
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How did we achieve this?
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING IS ALL ABOUT CONTEXT SWITCHES
16. IO traffic in MySQL
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R IT E
MYSQL WITH REPLICA
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBS
Amazon Elastic
Block Store (EBS)
Primary
Instance
Replica
Instance
1
2
3
4
5
Issue write to EBS – EBS issues to mirror, ack when both done
Stage write to standby instance through DRBD
Issue write to EBS on standby instance
IO FLOW
Steps 1, 3, 4 are sequential and synchronous
This amplifies both latency and jitter
Many types of writes for each user operation
Have to write data blocks twice to avoid torn writes
OBSERVATIONS
780K transactions
7,388K I/Os per million txns (excludes mirroring, standby)
Average 7.4 I/Os per transaction
PERFORMANCE
30 minute SysBench writeonly workload, 100GB dataset, RDS MultiAZ, 30K PIOPS
17. IO traffic in Aurora
AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
AMAZON AURORA
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R IT E
IO FLOW
Only write redo log records; all steps asynchronous
No data block writes (checkpoint, cache replacement)
6X more log writes, but 9X less network traffic
Tolerant of network and storage outlier latency
OBSERVATIONS
27,378K transactions 35X MORE
950K I/Os per 1M txns (6X amplification) 7.7X LESS
PERFORMANCE
Boxcar redo log records – fully ordered by LSN
Shuffle to appropriate segments – partially ordered
Boxcar to storage nodes and issue writesReplica
Instance
18. IO traffic in Aurora (Storage Node)
LOG RECORDS
Primary
Instance
INCOMING QUEUE
STORAGE NODE
S3 BACKUP
1
2
3
4
5
6
7
8
UPDATE
QUEUE
ACK
HOT
LOG
DATA
BLOCKS
POINT IN TIME
SNAPSHOT
GC
SCRUB
COALESCE
SORT
GROUP
PEER TO PEER GOSSIPPeer
Storage
Nodes
All steps are asynchronous
Only steps 1 and 2 are in foreground latency path
Input queue is 46X less than MySQL (unamplified, per node)
Favor latency-sensitive operations
Use disk space to buffer against spikes in activity
OBSERVATIONS
IO FLOW
① Receive record and add to in-memory queue
② Persist record and ACK
③ Organize records and identify gaps in log
④ Gossip with peers to fill in holes
⑤ Coalesce log records into new data block versions
⑥ Periodically stage log and new block versions to S3
⑦ Periodically garbage collect old versions
⑧ Periodically validate CRC codes on blocks
19. IO traffic in Aurora Replicas
PAGE CACHE
UPDATE
Aurora Master
30% Read
70% Write
Aurora Replica
100% New Reads
Shared Multi-AZ Storage
MySQL Master
30% Read
70% Write
MySQL Replica
30% New Reads
70% Write
SINGLE-THREADED
BINLOG APPLY
Data Volume Data Volume
Logical: Ship SQL statements to Replica
Write workload similar on both instances
Independent storage
Can result in data drift between Master and Replica
Physical: Ship redo from Master to Replica
Replica shares storage. No writes performed
Cached pages have redo applied
Advance read view when all commits seen
MYSQL READ SCALING AMAZON AURORA READ SCALING
20. “In MySQL, we saw replica lag spike to almost 12 minutes which is
almost absurd from an application’s perspective. With Aurora, the
maximum read replica lag across 4 replicas never exceeded 20 ms.”
Real-life data - read replica latency
21. Asynchronous group commits
Read
Write
Commit
Read
Read
T1
Commit (T1)
Commit (T2)
Commit (T3)
LSN 10
LSN 12
LSN 22
LSN 50
LSN 30
LSN 34
LSN 41
LSN 47
LSN 20
LSN 49
Commit (T4)
Commit (T5)
Commit (T6)
Commit (T7)
Commit (T8)
LSN GROWTH
Durable LSN at head-node
COMMIT QUEUE
Pending commits in LSN order
TIME
GROUP
COMMIT
TRANSACTIONS
Read
Write
Commit
Read
Read
T1
Read
Write
Commit
Read
Read
Tn
TRADITIONAL APPROACH AMAZON AURORA
Maintain a buffer of log records to write out to disk
Issue write when buffer full or time out waiting for writes
First writer has latency penalty when write rate is low
Request I/O with first write, fill buffer till write picked up
Individual write durable when 4 of 6 storage nodes ACK
Advance DB Durable point up to earliest pending ACK
22. Re-entrant connections multiplexed to active threads
Kernel-space epoll() inserts into latch-free event queue
Dynamically sizes threads pool
Gracefully handles 5000+ concurrent client sessions on r3.8xl
Standard MySQL – one thread per connection
Doesn’t scale with connection count
MySQL EE – connections assigned to thread group
Requires careful stall threshold tuning
CLIENTCONNECTION
CLIENTCONNECTION
LATCH FREE
TASK QUEUE
epoll()
MYSQL THREAD MODEL AURORA THREAD MODEL
Adaptive thread pool
23. Scan
Delete
Aurora lock management
Scan
Delete
Insert
Scan Scan
Insert
Delete
Scan
Insert
Insert
MySQL lock manager Aurora lock manager
Same locking semantics as MySQL
Concurrent access to lock chains
Multiple scanners allowed in an individual lock chains
Lock-free deadlock detection
Needed to support many concurrent sessions, high update throughput
25. Cached read performance
Catalog concurrency: Improved data dictionary
synchronization and cache eviction.
NUMA aware scheduler: Aurora scheduler is
now NUMA aware. Helps scale on multi-socket
instances.
Read views: Aurora now uses a latch-free
concurrent read-view algorithm to construct read
views.
0
100
200
300
400
500
600
700
MySQL 5.6 MySQL 5.7 Aurora 2015 Aurora 2016
In thousands of read requests/sec
* R3.8xlarge instance, <1GB dataset using Sysbench
25% Throughput gain
26. Smart scheduler: Aurora scheduler now
dynamically assigns threads between IO heavy
and CPU heavy workloads.
Smart selector: Aurora reduces read latency by
selecting the copy of data on a storage node with
best performance
Logical read ahead (LRA): We avoid read IO
waits by prefetching pages based on their order
in the btree.
Non-cached read performance
0
20
40
60
80
100
120
MySQL 5.6 MySQL 5.7 Aurora 2015 Aurora 2016
In thousands of requests/sec
* R3.8xlarge instance, 1TB dataset using Sysbench
10% Throughput gain
27. Scan
Delete
Hot row contention
Scan
Delete
Insert
Scan Scan
Insert
Delete
Scan
Insert
Insert
MySQL lock manager Aurora lock manager
Highly contended workloads had high memory and CPU
1.9 (Nov) – lock compression (bitmap for hot locks)
1.9 – replace spinlocks with blocking futex – up to 12x reduction in CPU, 3x improvement in throughput
December – use dynamic programming to release locks: from O(totalLocks * waitLocks) to O(totalLocks)
Throughput on Percona TPC-C 100 improved 29x (from 1,452 txns/min to 42,181 txns/min)
28. Hot row contention
MySQL 5.6 MySQL 5.7 Aurora Improvement
500 connections 6,093 25,289 73,955 2.92x
5000 connections 1,671 2,592 42,181 16.3x
Percona TPC-C – 10GB
* Numbers are in tpmC, measured using release 1.10 on an R3.8xlarge, MySQL numbers using RDS and EBS with 30K PIOPS
MySQL 5.6 MySQL 5.7 Aurora Improvement
500 connections 3,231 11,868 70,663 5.95x
5000 connections 5,575 13,005 30,221 2.32x
Percona TPC-C – 100GB
29. Accelerates batch inserts sorted by
primary key – works by caching the
cursor position in an index traversal.
Dynamically turns itself on or off based on
data pattern
Avoids contention in acquiring latches
while navigating down the tree.
Bi-directional, works across all insert
statements
• LOAD INFILE, INSERT INTO SELECT, INSERT
INTO REPLACE and, Multi-value inserts.
Insert performance
Index
R4 R5R2 R3R0 R1 R6 R7 R8
Index
Root
Index
R4 R5R2 R3R0 R1 R6 R7 R8
Index
Root
MySQL: Traverses B-tree starting from root for all inserts
Aurora: Inserts avoids index traversal;
30. Faster index build
MySQL 5.6 leverages Linux read ahead – but
this requires consecutive block addresses in
the btree. It inserts entries top down into the
new btree, causing splits and lots of logging.
Aurora’s scan pre-fetches blocks based on
position in tree, not block address.
Aurora builds the leaf blocks and then the
branches of the tree.
• No splits during the build.
• Each page touched only once.
• One log record per page.
2-4X better than MySQL 5.6 or MySQL 5.7
0
2
4
6
8
10
12
r3.large on 10GB
dataset
r3.8xlarge on
10GB dataset
r3.8xlarge on
100GB dataset
Hours RDS MySQL 5.6 RDS MySQL 5.7 Aurora 2016
31. Why Spatial Index
Need to store and reason about spatial data
• E.g. “Find all people within 1 mile of a hospital”
• Spatial data is multi-dimensional
• B-Tree indexes are one-dimensional
Aurora supports spatial data types (point/polygon)
• GEOMETRY data types inherited from MySQL 5.6
• This spatial data cannot be indexed
Two possible approaches:
• Specialized access method for spatial data (e.g., R-Tree)
• Map spatial objects to one-dimensional space & store in B-
Tree - space-filling curve using a grid approximation
A
B
A A
A A
A A A
B
B
B
B
B
A COVERS B
B COVEREDBY A
A CONTAINS B
B INSIDE A
A TOUCH B
B TOUCH A
A OVERLAPBDYINTERSECT B
B OVERLAPBDYINTERSECT A
A OVERLAPBDYDISJOINT B
B OVERLAPBDYDISJOINT A
A EQUAL B
B EQUAL A
A DISJOINT B
B DISJOINT A
A COVERS B
B ON A
32. Spatial Indexes in Aurora
Z-index used in Aurora
Challenges with R-Trees
Keeping it efficient while balanced
Rectangles should not overlap or cover empty space
Degenerates over time
Re-indexing is expensive
R-Tree used in MySQL 5.7
Z-index (dimensionally ordered space filling curve)
Uses regular B-Tree for storing and indexing
Removes sensitivity to resolution parameter
Adapts to granularity of actual data without user declaration
Eg GeoWave (National Geospatial-Intelligence Agency)
35. Storage Durability
Storage volume automatically grows up to 64 TB
Quorum system for read/write; latency tolerant
Peer to peer gossip replication to fill in holes
Continuous backup to S3 (built for 11 9s durability)
Continuous monitoring of nodes and disks for repair
10GB segments as unit of repair or hotspot rebalance
Quorum membership changes do not stall writes
AZ 1 AZ 2 AZ 3
Amazon S3
36. More Replicas
Aurora cluster contains primary node
and up to fifteen replicas
Failing database nodes are
automatically detected and replaced
Failing database processes are
automatically detected and recycled
Customer application may scale-out
read traffic across replicas
Replica automatically promoted on
persistent outage
AZ 1 AZ 3AZ 2
Primary
Node
Primary
Node
Primary
Node
Primary
Node
Primary
Node
Secondary
Node
Primary
Node
Primary
Node
Secondary
Node
37. Continuous Backup
Segment snapshot Log records
Recovery point
Segment 1
Segment 2
Segment 3
Time
• Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3
• Backup happens continuously without performance or availability impact
• At restore, retrieve the appropriate segment snapshots and log streams to storage nodes
• Apply log streams to segment snapshots in parallel and asynchronously
38. Traditional Databases
Have to replay logs since the last
checkpoint
Typically 5 minutes between checkpoints
Single-threaded in MySQL; requires a
large number of disk accesses
Amazon Aurora
Underlying storage replays redo records
on demand as part of a disk read
Parallel, distributed, asynchronous
No replay for startup
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo logs being
applied to each segment on demand, in
parallel, asynchronously
Instant Crash Recovery
39. Survivable Caches
We moved the cache out of the
database process
Cache remains warm in the event of
database restart
Lets you resume fully loaded
operations much faster
Instant crash recovery + survivable
cache = quick and easy recovery from
DB failures
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
Caching process is outside the DB process
and remains warm across a database restart
40. Faster Fail-Over
App
RunningFailure Detection DNS Propagation
Recovery Recovery
DB
Failure
MYSQL
App
Running
Failure Detection DNS Propagation
Recovery
DB
Failure
AURORA WITH MARIADB DRIVER
1 5 - 2 0 s e c
3 - 2 0 s e c
43. Availability is about more than HW failures
You also incur availability disruptions when you
1. Patch your database software
2. Modify your database schema
3. Perform large scale database reorganizations
4. DBA errors requiring database restores
44. Zero downtime patching
Networking
state
Application
state
Storage Service
App
state
Net
state
App
state
Net
state
BeforeZDP
New DB
Engine
Old DB
Engine
New DB
Engine
Old DB
Engine
WithZDP
User sessions terminate
during patching
User sessions remain
active through patching
Storage Service
45. Zero downtime patching – current constraints
We have to go to our current patching model when we can’t park connections:
• Long running queries
• Open transactions
• Bin-log enabled
• Parameter changes pending
• Temporary tables open
• Locked Tables
• SSL connections open
• Read replicas instances
We are working on addressing the above.
47. T2 RI discounts
Up to 34% with a 1-year RI
Up to 57% with a 3-year RI
vCPU Mem Hourly Price
db.t2.medium 2 4 $0.082
db.r3.large 2 15.25 $0.29
db.r3.xlarge 4 30.5 $0.58
db.r3.2xlarge 8 61 $1.16
db.r3.4xlarge 16 122 $2.32
db.r3.8xlarge 32 244 $4.64
An R3.Large is too expensive for my use case
T2.Small coming in Q1. 2017
*Prices are for Virginia
48. My databases need to meet certifications
Amazon Aurora gives each database
instance IP firewall protection
Aurora offers transparent encryption at rest
and SSL protection for data in transit
Amazon VPC lets you isolate and control
network configuration and connect
securely to your IT infrastructure
AWS Identity and Access Management
provides resource-level permission
controls
*New* *New*
49. Aurora Auditing
MariaDB server_audit plugin Aurora native audit support
We can sustain over 500K events/sec
Create event string
DDL
DML
Query
DCL
Connect
DDL
DML
Query
DCL
Connect
Write
to File
Create event string
Create event string
Create event string
Create event string
Create event string
Latch-free
queue
Write to File
Write to File
Write to File
MySQL 5.7 Aurora
Audit Off 95K 615K 6.47x
Audit On 33K 525K 15.9x
Sysbench Select-only Workload on 8xlarge Instance
50. AWS ecosystem
Lambda
S3
IAM
CloudWatch
Generate Lambda event from Aurora stored procedures.
Load data from S3, store snapshots and backups in S3.
Use IAM roles to manage database access control.
Upload systems metrics and audit logs to CloudWatch.
*NEW*
Q1
51. MySQL compatibility
Business Intelligence Data Integration Query and Monitoring
“We ran our compatibility test suites against
Amazon Aurora and everything just worked."
- Dan Jewett, VP, Product Management at Tableau
MySQL 5.6 / InnoDB compatible
No application compatibility issues
reported in last 18 months
MySQL ISV applications run pretty much
as is
Working on 5.7 compatibility
Running a bit slower than expected –
hope to make it available soon
Back ported 81 fixes from different
MySQL releases;
53. My applications require PostgreSQL
Amazon Aurora with PostgreSQL
compatibility now in preview
Same underlying scale out, 3 AZ, 6 copy,
fault tolerant, self healing, expanding
database optimized storage tier
Integrated with PostgreSQL 9.6
Logging + Storage
SQL
Transactions
Caching
Amazon
S3
54. Open source database
In active development for 20 years
Owned by a foundation, not a single company
Permissive innovation-friendly open source license
PostgreSQL Fast Facts
Open Source Initiative
55. High performance out of the box
Object-oriented and ANSI-SQL:2008 compatible
Most geospatial features of any open-source database
Supports stored procedures in 12 languages (Java, Perl,
Python, Ruby, Tcl, C/C++, its own Oracle-like PL/pgSQL,
etc.)
PostgreSQL Fast Facts
56. Most Oracle-compatible open-source database
Highest AWS Schema Conversion Tool automatic
conversion rates are from Oracle to PostgreSQL
PostgreSQL Fast Facts
57. What does PostgreSQL compatibility mean?
PostgreSQL 9.6 + Amazon Aurora cloud-optimized storage
Performance: Up to 2x+ better performance than PostgreSQL alone
Availability: failover time of < 30 seconds
Durability: 6 copies across 3 Availability Zones
Read Replicas: single-digit millisecond lag times on up to 15 replicas
Amazon Aurora Storage
58. What does PostgreSQL compatibility mean?
Cloud-native security and encryption
AWS Key Management Service (KMS) and AWS
Identity and Access Management (IAM)
Easy to manage with Amazon RDS
Easy to load and unload
AWS Database Migration Service and AWS Schema
Conversion Tool
Fully compatible with PostgreSQL, now and for the
foreseeable future
Not a compatibility layer – native PostgreSQL
implementation
AWS DMS
Amazon RDS
PostgreSQL
59. Amazon Aurora with PostgreSQL Compatibility
Performance By The Numbers
Measurement Result
PgBench >= 2x faster
SysBench 2x-3x faster
Data Loading 3x faster
Response Time >2x faster
Throughput Jitter >3x more consistent
Throughput at Scale 3x faster
Recovery Speed Up to 85x faster
60. First Step: Enhanced Monitoring
Released 2016
O/S Metrics
Process & thread List
Up to 1 second granularity