This document summarizes a presentation on building applications with DynamoDB. The presentation covers:
- Getting started with DynamoDB by making two decisions - choosing a primary key and provisioning throughput - and making one API call to create a table.
- Data modeling concepts in DynamoDB including tables, items, attributes, primary keys, and queries. Common patterns like modeling relationships and handling large items are also discussed.
- Programming the DynamoDB API and available operations like PutItem, GetItem, Query and Scan. Conditional updates, batch operations, and pagination of results are also covered.
- Real-world data modeling examples including storing scores and leaderboards for an online game and creating
46. Items are a collection of
attributes.
Each attribute has a key and a value.
An item can have any number of
attributes, up to 64k total.
47. Two scalar data types.
String: Unicode, UTF8 binary encoding.
Number: 38 digit precision.
Multi-value strings and numbers.
48. date =
id = 100 2012-05-16-09-00-10 total = 25.00
date =
id = 101 2012-05-15-15-00-11 total = 35.00
date =
id = 101 2012-05-16-12-00-10 total = 100.00
date =
id = 102 2012-03-20-18-23-10 total = 20.00
date =
id = 102 2012-03-20-18-23-10 total = 120.00
49. Table
date =
id = 100 2012-05-16-09-00-10 total = 25.00
date =
id = 101 2012-05-15-15-00-11 total = 35.00
date =
id = 101 2012-05-16-12-00-10 total = 100.00
date =
id = 102 2012-03-20-18-23-10 total = 20.00
date =
id = 102 2012-03-20-18-23-10 total = 120.00
50. Item
date =
id = 100 2012-05-16-09-00-10 total = 25.00
date =
id = 101 2012-05-15-15-00-11 total = 35.00
date =
id = 101 2012-05-16-12-00-10 total = 100.00
date =
id = 102 2012-03-20-18-23-10 total = 20.00
date =
id = 102 2012-03-20-18-23-10 total = 120.00
51. Attribute
date =
id = 100 2012-05-16-09-00-10 total = 25.00
date =
id = 101 2012-05-15-15-00-11 total = 35.00
date =
id = 101 2012-05-16-12-00-10 total = 100.00
date =
id = 102 2012-03-20-18-23-10 total = 20.00
date =
id = 102 2012-03-20-18-23-10 total = 120.00
52. Where is the schema?
Tables do not require a formal schema.
Items are an arbitrary sized hash.
Just need to specify the primary key.
53. Items are indexed by
primary key.
Single hash keys and composite keys.
54. Hash Key
date =
id = 100 2012-05-16-09-00-10 total = 25.00
date =
id = 101 2012-05-15-15-00-11 total = 35.00
date =
id = 101 2012-05-16-12-00-10 total = 100.00
date =
id = 102 2012-03-20-18-23-10 total = 20.00
date =
id = 102 2012-03-20-18-23-10 total = 120.00
55. Range key for queries.
Querying items by composite key.
56. Hash Key + Range Key
date =
id = 100 2012-05-16-09-00-10 total = 25.00
date =
id = 101 2012-05-15-15-00-11 total = 35.00
date =
id = 101 2012-05-16-12-00-10 total = 100.00
date =
id = 102 2012-03-20-18-23-10 total = 20.00
date =
id = 102 2012-03-20-18-23-10 total = 120.00
63. One API call, multiple items.
BatchGet returns multiple items by
primary key.
BatchWrite performs up to 25 put or
delete operations.
Throughput is measured by IO,
not API calls.
65. Query vs Scan
Query for composite key queries.
Scan for full table scans, exports.
Both support pages and limits.
Maximum response is 1Mb in size.
66. Query patterns.
Retrieve all items by hash key.
Range key conditions:
==, <, >, >=, <=, begins with, between.
Counts. Top and bottom n values.
Paged responses.
68. Patterns
1. Mapping relationships
with range keys.
No cross-table joins in DynamoDB.
Use composite keys to model
relationships.
69. Data model example: online gaming.
Storing scores and leader boards.
Players with
high Scores.
Leader board
for
each game.
70. Data model example: online gaming.
Storing scores and leader boards.
Players with
high Scores.
Players: hash key
user_id = location = joined = Leader board
for
mza Cambridge 2011-07-04 each game.
user_id = location = joined =
jeffbarr Seattle 2012-01-20
user_id = location = joined =
werner Worldwide 2011-05-15
71. Data model example: online gaming.
Storing scores and leader boards.
Players with
high Scores.
Players: hash key
user_id = location = joined = Leader board
for
mza Cambridge 2011-07-04 each game.
user_id = location = joined =
jeffbarr Seattle 2012-01-20
user_id = location = joined =
werner Worldwide 2011-05-15
Scores: composite key
user_id = game = score =
mza angry-birds 11,000
user_id = game = score =
mza tetris 1,223,000
user_id = location = score =
werner bejewelled 55,000
72. Data model example: online gaming.
Storing scores and leader boards.
Players with
high Scores.
Players: hash key
user_id = location = joined = Leader board
for
mza Cambridge 2011-07-04 each game.
user_id = location = joined =
jeffbarr Seattle 2012-01-20
user_id = location = joined =
werner Worldwide 2011-05-15
Scores: composite key Leader boards: composite key
user_id = game = score = game = score = user_id =
mza angry-birds 11,000 angry-birds 11,000 mza
user_id = game = score = game = score = user_id =
mza tetris 1,223,000 tetris 1,223,000 mza
user_id = location = score = game = score = user_id =
werner bejewelled 55,000 tetris 9,000,000 jeffbarr
73. Data model example: online gaming.
Storing scores and leader boards.
Players: hash key
user_id = location = joined =
mza Cambridge 2011-07-04 Scores by user
user_id =
jeffbarr
location =
Seattle
joined =
2012-01-20
(and by game)
user_id = location = joined =
werner Worldwide 2011-05-15
Scores: composite key Leader boards: composite key
user_id = game = score = game = score = user_id =
mza angry-birds 11,000 angry-birds 11,000 mza
user_id = game = score = game = score = user_id =
mza tetris 1,223,000 tetris 1,223,000 mza
user_id = location = score = game = score = user_id =
werner bejewelled 55,000 tetris 9,000,000 jeffbarr
74. Data model example: online gaming.
Storing scores and leader boards.
Players: hash key
user_id = location = joined = High scores by
mza Cambridge 2011-07-04
user_id = location = joined = game
jeffbarr Seattle 2012-01-20
user_id = location = joined =
werner Worldwide 2011-05-15
Scores: composite key Leader boards: composite key
user_id = game = score = game = score = user_id =
mza angry-birds 11,000 angry-birds 11,000 mza
user_id = game = score = game = score = user_id =
mza tetris 1,223,000 tetris 1,223,000 mza
user_id = location = score = game = score = user_id =
werner bejewelled 55,000 tetris 9,000,000 jeffbarr
75. Patterns
2. Handling large items.
Unlimited attributes per item.
Unlimited items per table.
Max 64k per item.
76. Data model example: large items.
Storing more than 64k across items.
Large messages: composite keys
message_id = part = message =
1 1 <first 64k>
message_id = part = message =
1 2 <second 64k>
message_id = part = joined =
1 3 <third 64k>
Split attributes across items.
Query by message_id and part to retrieve.
77. Patterns
Store a pointer to objects in
Amazon S3.
Large data stored in S3.
Location stored in DynamoDB.
99.999999999% data durability in S3.
78. Patterns
3. Managing secondary
indices.
Not supported by DynamoDB.
Create your own.
79. Data model example: secondary indices.
Storing more than 64k across items.
Users: hash key
user_id = first_name = last_name =
mza Matt Wood
user_id = first_name = last_name =
mattfox Matt Fox
user_id = first_name = last_name =
werner Werner Vogels
80. Data model example: secondary indices.
Storing more than 64k across items.
Users: hash key
user_id = first_name = last_name =
mza Matt Wood
user_id = first_name = last_name =
mattfox Matt Fox
user_id = first_name = last_name =
werner Werner Vogels
First name index: composite keys
first_name = user_id =
Matt mza
first_name = user_id =
Matt mattfox
first_name = user_id =
Werner werner
81. Data model example: secondary indices.
Storing more than 64k across items.
Users: hash key
user_id = first_name = last_name =
mza Matt Wood
user_id = first_name = last_name =
mattfox Matt Fox
user_id = first_name = last_name =
werner Werner Vogels
First name index: composite keys Second name index: composite keys
first_name = user_id = last_name = user_id =
Matt mza Wood mza
first_name = user_id = last_name = user_id =
Matt mattfox Fox mattfox
first_name = user_id = last_name = user_id =
Werner werner Vogels werner
82. Data model example: secondary indices.
Storing more than 64k across items.
Users: hash key
user_id = first_name = last_name =
mza Matt Wood
user_id = first_name = last_name =
mattfox Matt Fox
user_id = first_name = last_name =
werner Werner Vogels
First name index: composite keys Second name index: composite keys
first_name = user_id = last_name = user_id =
Matt mza Wood mza
first_name = user_id = last_name = user_id =
Matt mattfox Fox mattfox
first_name = user_id = last_name = user_id =
Werner werner Vogels werner
83. Data model example: secondary indices.
Storing more than 64k across items.
Users: hash key
user_id = first_name = last_name =
mza Matt Wood
user_id = first_name = last_name =
mattfox Matt Fox
user_id = first_name = last_name =
werner Werner Vogels
First name index: composite keys Second name index: composite keys
first_name = user_id = last_name = user_id =
Matt mza Wood mza
first_name = user_id = last_name = user_id =
Matt mattfox Fox mattfox
first_name = user_id = last_name = user_id =
Werner werner Vogels werner
84. Patterns
4. Time series data.
Logging, click through, ad views,
game play data, application usage.
Non-uniform access patterns.
Newer data is ‘live’.
Older data is read only.
85. Data model example: time series data.
Rolling tables for hot and cold data.
Events table: composite keys
event_id = timestamp = key =
1000 2012-05-16-09-59-01 value
event_id = timestamp = key =
1001 2012-05-16-09-59-02 value
event_id = timestamp = key =
1002 2012-05-16-09-59-02 value
86. Data model example: time series data.
Rolling tables for hot and cold data.
Events table: composite keys
event_id = timestamp = key =
1000 2012-05-16-09-59-01 value
event_id = timestamp = key =
1001 2012-05-16-09-59-02 value
event_id = timestamp = key =
1002 2012-05-16-09-59-02 value
Events table for April: composite keys Events table for January: composite keys
event_id = timestamp = event_id = timestamp =
400 2012-04-01-00-00-01 100 2012-01-01-00-00-01
event_id = timestamp = event_id = timestamp =
401 2012-04-01-00-00-02 101 2012-01-01-00-00-02
event_id = timestamp = event_id = timestamp =
402 2012-04-01-00-00-03 102 2012-01-01-00-00-03
87. Patterns
Hot and cold tables.
Dec Jan Feb Mar April May
88. Patterns
Hot and cold tables.
Dec Jan Feb Mar April May
higher
throughput
89. Patterns
Hot and cold tables.
Dec Jan Feb Mar April May
lower higher
throughput throughput
90. Patterns
Hot and cold tables.
Dec Jan Feb Mar April May
data to S3,
delete cold tables
91. Patterns
Hot and cold tables.
Jan Feb Mar Apr May June
92. Patterns
Hot and cold tables.
Feb Mar Apr May June July
93. Patterns
Hot and cold tables.
Mar Apr May June July Aug
94. Patterns
Hot and cold tables.
Apr May June July Aug Sept
95. Patterns
Hot and cold tables.
May June July Aug Sept Oct
96. Patterns
Not out of mind.
DynamoDB and S3 data can be
integrated for analytics.
Run queries across hot and cold data
with Elastic MapReduce.
98. Uniform workloads.
DynamoDB divides table data into
multiple partitions.
Data is distributed primarily by
hash key.
Provisioned throughput is divided
evenly across the partitions.
99. Uniform workloads.
To achieve and maintain full
provisioned throughput for a table,
spread your workload evenly across
the hash keys.
101. Patterns
1. Distinct values for hash
keys.
Hash key elements should have a
high number of distinct values.
102. Data model example: hash key selection.
Well distributed work loads
Users
user_id = first_name = last_name =
mza Matt Wood
user_id = first_name = last_name =
jeffbarr Jeff Barr
user_id = first_name = last_name =
werner Werner Vogels
user_id = first_name = last_name =
mattfox Matt Fox
... ... ...
103. Data model example: hash key selection.
Well distributed work loads
Users
user_id = first_name = last_name =
mza Matt Wood
user_id = first_name = last_name =
jeffbarr Jeff Barr
user_id = first_name = last_name =
werner Werner Vogels
user_id = first_name = last_name =
mattfox Matt Fox
... ... ...
Lots of users with unique user_id.
Workload well distributed across user partitions.
104. Patterns
2. Avoid limited hash key
values.
Hash key elements should have a
high number of distinct values.
105. Data model example: small hash value range.
Non-uniform workload.
Status responses
status = date =
200 2012-04-01-00-00-01
status = date =
404 2012-04-01-00-00-01
status date =
404 2012-04-01-00-00-01
status = date =
404 2012-04-01-00-00-01
106. Data model example: small hash value range.
Non-uniform workload.
Status responses
status = date =
200 2012-04-01-00-00-01
status = date =
404 2012-04-01-00-00-01
status date =
404 2012-04-01-00-00-01
status = date =
404 2012-04-01-00-00-01
Small number of status codes.
Unevenly, non-uniform workload.
107. Patterns
3. Model for even
distribution of access.
Access by hash key value should be
evenly distributed across the dataset.
109. Data model example: uneven access pattern by key.
Non-uniform access workload.
Devices
mobile_id = access_date =
100 2012-04-01-00-00-01
mobile_id = access_date =
100 2012-04-01-00-00-02
mobile_id = access_date =
100 2012-04-01-00-00-03
mobile_id = access_date =
100 2012-04-01-00-00-04
... ...
Large number of devices.
Small number which are much more popular than others.
Workload unevenly distributed.
110. Data model example: randomize access pattern by key.
Towards a uniform workload.
Devices
mobile_id = access_date =
100.1 2012-04-01-00-00-01
mobile_id = access_date =
100.2 2012-04-01-00-00-02
mobile_id = access_date =
100.3 2012-04-01-00-00-03
mobile_id = access_date =
100.4 2012-04-01-00-00-04
... ...
Randomize access pattern.
Workload randomised by hash key.
124. In summary...
DynamoDB Partitioning
Predictable performance Automatic partitioning
Provisioned throughput Hot and cold data
Libraries & mappers Size/throughput ratio
Data modeling
Tables & items
Read & write patterns
Time series data
125. In summary...
DynamoDB Partitioning
Predictable performance Automatic partitioning
Provisioned throughput Hot and cold data
Libraries & mappers Size/throughput ratio
Data modeling Analytics
Tables & items Elastic MapReduce
Read & write patterns Hive queries
Time series data Backup & restore