4. Data volume since 2010
• 90% of stored data generated in
last 2 years
• 1 terabyte of data in 2010 equals
6.5 petabytes today
• Linear correlation between data
pressure and technical innovation
• No reason these trends will not
continue over time
6. What is NoSQL?
NoSQL is a term to describe data stores that trade full ACID
compliance for high availability and scale.
A
C
I
D
tomicity
onsistency
solation
urability
Single row/single item only
Eventual consistency
Dirty Read
Data replication on commodity storage
7. Why NoSQL?
• Dirty Reads?
• Eventual Consistency?
• Single row transactions only?
• Why would anybody trade ACID compliance for this?
11. The CAP Theorem
Network partitions will happen in
distributed systems:
DB
DBDB
DB DB
Consistency
Availability
Partition Tolerance
C A
P
CA
APCP
12. SQL vs. NoSQL schema design NoSQL design optimizes for
compute instead of storage
13. Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad-hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
16. Amazon DynamoDB
DynamoDB is a fully managed, NoSQL document and key value data store
Predictable Performance
Highly Available
Massively Scalable
Fully Managed
Low Cost
18. WRITES
Replicated continuously to 3
Availability Zones
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No latency trade-off
Designed to
support 99.99%
of availability
Built for high
durability
High availability and durability
19. High availability and durability
DynamoDB automatically partition data
• Partition key spreads data (and workload) across partitions
• Automatically partitions as data grows and throughput needs
increase
High-scale
APP
Large number of unique hash keys
+
Uniform distribution of workload
across hash keys
Partition 1..N
23. 00 55 A954 FFAA00 FF
Partition Keys
Id = 1
Name = Jim
Hash (1) = 7B
Id = 2
Name = Andy
Dept = Eng
Hash (2) = 48
Id = 3
Name = Kim
Dept = Ops
Hash (3) = CD
Key Space
Partition Key uniquely identifies an item
Partition Key is used for building an unordered hash index
Allows table to be partitioned for scale
24. Partition 3
Partition:Sort Key uses two attributes together to uniquely identify an Item
Within unordered hash index, data is arranged by the sort key
No limit on the number of items (∞) per partition key
Except if you have local secondary indexes
Partition:Sort Key
00:0 FF:∞
Hash (2) = 48
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Customer# = 1
Order# = 10
Item = Toy
Customer# = 1
Order# = 11
Item = Boots
Hash (1) = 7B
Customer# = 3
Order# = 10
Item = Book
Customer# = 3
Order# = 11
Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AAPartition 1 Partition 2
25. Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
26. Local secondary index (LSI)
Alternate sort key attribute
Index is local to a partition key
A1
(partition)
A3
(sort)
A2
(item key)
A1
(partition)
A2
(sort)
A3 A4 A5
LSIs
A1
(partition)
A4
(sort)
A2
(item key)
A3
(projected)
Table
KEYS_ONLY
INCLUDE A3
A1
(partition)
A5
(sort)
A2
(item key)
A3
(projected)
A4
(projected)
ALL
10 GB max per partition key, i.e.
LSIs limit the # of range keys!
27. Global secondary index (GSI)
Alternate partition and/or sort key
Index is across all partition keys
Use composite sort keys for compound indexes
A1
(partition)
A2 A3 A4 A5
A5
(partition)
A4
(sort)
A1
(item key)
A3
(projected)
INCLUDE A3
A4
(partition)
A5
(sort)
A1
(item key)
A2
(projected)
A3
(projected)
ALL
A2
(partition)
A1
(itemkey)
KEYS_ONLY
GSIs
Table
RCUs/WCUs provisioned
separately for GSIs
Online indexing
28. How do GSI updates work?
Table
Primary
table
Primary
table
Primary
table
Primary
table
Global
Secondary
Index
Client
2. Asynchronous
update (in progress)
If GSIs don’t have enough write capacity, table writes will be throttled!
29. LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario,
use GSI!
31. Scaling
Throughput
Provision any amount of throughput to a table
Size
Add any number of items to a table
- Max item size is 400 KB
- LSIs limit the number of range keys due to 10 GB limit
Scaling is achieved through partitioning
32. Throughput
Provisioned at the table level
Write capacity units (WCUs) are measured in 1 KB per second
Read capacity units (RCUs) are measured in 4 KB per second
- RCUs measure strictly consistent reads
- Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
WCURCU
33. Partitioning Math
In the future, these details might change…
Number of Partitions
By Capacity (Total RCU / 3000) + (Total WCU / 1000)
By Size Total Size / 10 GB
Total Partitions CEILING(MAX (Capacity, Size))
34. Partitioning Example
Table size = 8 GB, RCUs = 5000, WCUs = 500
RCUs per partition = 5000/3 = 1666.67
WCUs per partition = 500/3 = 166.67
Data/partition = 10/3 = 3.33 GB
RCUs and WCUs are uniformly
spread across partitions
Number of Partitions
By Capacity (5000 / 3000) + (500 / 1000) = 2.17
By Size 8 / 10 = 0.8
Total Partitions CEILING(MAX (2.17, 0.8)) = 3
35. What causes throttling?
If sustained throughput goes beyond provisioned throughput per partition
Non-uniform workloads
Hot keys/hot partitions
Very large bursts
Mixing hot data with cold data
Use a table per time period
From the example before:
Table created with 5000 RCUs, 500 WCUs
RCUs per partition = 1666.67
WCUs per partition = 166.67
If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB
may throttle requests
- Solution: Increase provisioned throughput
36. To learn more, please attend:
Deep Dive on Amazon DynamoDB
3:55 p.m.– 4:35 p.m.
38. DynamoDB Streams
Stream of table update
Asynchronous
Exactly once
Strictly ordered
24-hr lifetime per item
Integration Capabilities
DynamoDB Triggers
Implement as AWS lambda
function
Your code scale automatically
Java, Node.js and Python
39. IAM
Fine-grained access control
via AWS IAM
Table-,Item, and attribute- level
access control
Integration Capabilities
ElasticSearch integration
Full-text queries
Add search to mobile app
Monitor IoT sensor status
code
App telemetry pattern
discovery using regular
expressions