This document provides an overview of Riak TS, Basho's new purpose-built time series database. It describes Riak TS's key features like high write throughput, efficient range query support, and horizontal scalability. It also outlines Riak TS's data modeling approach of co-locating and partitioning time-series data, its SQL-like query language, and provides examples of its performance and roadmap. Finally, it demonstrates a potential use case application called UNCORKD for tracking wine check-ins and reviews.
6. Masterless Architecture
Riak has a masterless architecture in which every node in a cluster is capable of serving
read and write requests. The benefits of a masterless architecture include:
Basho Technologies | 7
7. Data Replication & Consistency
Reads and writes use quorum level consistency by default.
Basho Technologies | 8
put(“bucket/key”)
8. put(“bucket/key”)
High Availability
9
If a node goes offline, “fallback” virtual nodes will take over and automatically begin
serving requests on behalf of the downed virtual nodes. Control and data are
automatically handed back to the original node when it returns.
9. Data Guarantees
Version vectors are used to maintain an actor-based accounting of updates to
an object in Riak. This allows the system to reason about causality in the event
that multiple versions of an object exist at any given point in time.
Version 1 Version 2
v1 v2 v3v1 v2 v3
{v1:2,v2:3,v3:2} {v1:2,v2:3,v3:1}
(dominates)
10. Write once buckets
• Riak 2.1 introduced the
concept of "write once"
buckets
• 107% increase in
throughput vs standard
buckets
• Intended for immutable
data
11. Pluggable Storage Backends
Basho Technologies | 12
Pluggable storage backends enable you to choose the low-level storage engine that best
fits your use case.
• Bitcask
Basho’s open source key/value store and Riak’s default backend.
• LevelDB
Google’s open source key/value store
• In Memory
Uses Erlang’s ets tables to store data in memory
• Multi-Backend
Select the right backend for each use case on a per bucket basis
12. Riak automatically replicates
between clusters
• Configurable number of remote
replicas
• Options for real-time sync and full
sync
Geo-Data Locality allows
localized data processing
• Reduced latency to
end-users
• Allows sub 5ms responses
• Active-Active ensures
continuous user experience
Availability Across Geographies
Multi-cluster Replication
13
13. Riak KV: Use Cases
• Mutable data
• Documents, JSON, metadata
• Session state
• User/customer data
• Transaction histories
• Archives
Basho Technologies | 14
14. Riak KV: Search
• Right it like Riak, read it like
SOLR
• Riak Search communicates and
monitors the Solr OS process
• Riak Search listens for changes
in key/value (KV) data and
makes the appropriate changes
to Solr indexes
• Riak Search takes a user query
on any node and converts it to a
Solr distributed search
• Protocol Buffer interface and
Solr interface via HTTP
15. Riak Data Types are a developer-friendly way to avoid conflicting
versions of objects in an eventually consistent environment.
• Map
Supports the nesting of the Riak
Data Types.
• Register
A named binary field that can
only be used as part of a Map.
• Counter
Keeps tracks of increments and
decrements on an integer
• Flag
Values limited to enable or
disable
• Set
A collection of unique binary
values that supports add and
remove operations on one or
more values
Riak KV: Data Types
16
17. Riak TS: Use Cases
• Immutable data
• Infrastructure monitoring / metrics
• Real-time analytics
• IoT / Sensor Data
• Financial Data
• Scientific Observations
Basho Technologies | 18
18. Riak TS: Requirements & design goals
High write throughput
Efficient range query support
Robust queryability
Horizontal scale
High availability
Multi-region support
Enterprise scale solution
Basho Technologies | 19
19. Riak TS: Design & Implementation
• Data distribution
– Data is co-located on a per series
basis for a configurable time
horizon
– A given series is partitioned into
ordered ranges of a configurable
size.
• Data modeling
– SQL-like data definition (bucket
parameterization)
• Read/write
– Efficient write path
– Query subsystem
– SQL-like query language
Basho Technologies | 20
20. Riak TS Implementation: Data definition
Basho Technologies | 21
Riak TS uses a SQL-like CREATE TABLE statement to associate a schema with a
bucket.
21. Riak TS Implementation: Query language
Basho Technologies | 22
Riak TS supports a SQL-like query language using the familiar semantics of the SELECT
statement.
SELECT weather, temperature FROM GeoCheckin WHERE myfamily =
'family1’ AND myseries = 'series1' AND time > 1449864277000 and time <
1449864290000 AND temperature > 27.0
SELECT AVG(temperature) FROM GeoCheckin WHERE myfamily = 'family1’
AND myseries = 'series1' AND time > 1449864277000 and time <
1449864290000
SELECT temperature * 1.5 FROM GeoCheckin WHERE myfamily = 'family1’
AND myseries = 'series1' AND time > 1449864277000 and time <
1449864290000
25. UNCORKD - Overview
Basho Technologies | 26
UNTAPPD for wine snobs! (And a tribute to the cool social/beer app)
• Tracks checkins by wine variety and location
• Maintains per-user friend lists and activity feeds
• Maintains per-location statistics
• Support Checkin, Activity feed & Location-based queries with
aggregation & filtering (time-based and geospatial)
26. UNCORKD – Riak KV Data Definition
Basho Technologies | 27
• User, Location & Wine entity data stored in standard KV buckets
27. UNCORKD – Riak KV Data Definition (2)
Basho Technologies | 28
• Friend lists and location statistics maintained via Set and Counter data
types
28. UNCORKD – Riak TS Data Definition
Basho Technologies | 29
• Wine name/id used as the series_id for Checkin
• User name/id used as the series_id for Activity
• Include lat/long data to support basic geospatial filtering
• 14-day time quantization
29. UNCORKD – Generate Checkins
Basho Technologies | 30
• Generates a months worth of checkins
• Attempts 1 checkin per minute with time-weighted probability
30. UNCORKD – Insert Checkin & Fan Out
Basho Technologies | 31
• Insert checkin
• Fan out to friends activity feeds
• Update per location statistics
31. UNCORKD – Query Checkins
Basho Technologies | 32
• Checkin count and average rating for ‘2015_Talisman_PinotNoir’
• List checkins with times and locations
32. UNCORKD – Query Checkins
Basho Technologies | 33
• List checkins for a given geographic area (Mission District)
33. UNCORKD – Query Activity Feed
Basho Technologies | 34
• List friends for user ‘AmyPhillips@gmail.com’
• Query activity feed
34. UNCORKD – Query Location stats
Basho Technologies | 35
• Per day checkin counts for ‘Etcetera_Wine_Bar’
At Basho, we are well known for Riak KV: our flagship database and key value store that’s been around for quite a while.
Riak TS is both a natural extension of our existing architecture as well as a nice complement to the functionality offered by Riak KV.
Agenda: 1) Riak KV overview, 2) Riak TS intro, 3) use case walkthrough.
For anybody unfamiliar, Riak is an eventually consistent Dynamo Storage System inspired by Amazon’s seminal white paper. It’s an AP system in CAP terms.
The business value of such systems is all about high availability, horizontal (predictable) scalability, high concurrency/throughput and low cost.
Riak adds to this the highest possible data guarantees, excellent multi-region/datacenter support, resilience/stability and ease-of-use.
“High availability without data loss”
“Reliability at Scale”
Riak uses a SHA-1 hash-based approach to data distribution.
The SHA-1 hash integer range is logically subdivided into a number of partitions.
We call this the ring. The default ring size is 64.
The logical partitions are each mapped to three virtual nodes (3 is the default and only very rarely changed).
The SHA-1 hash of the bucket and key name combination is used to route a given object to the appropriate vnodes, based on the mapping.
Vnodes are, in turn, distributed evenly across the available nodes in the cluster.
Riak uses a masterless architecture.
Requests are routed (typically by a load balancer) to any one of the nodes in the cluster, that initial node becoming the coordinator for the request.
Coordinators maintain internal metadata about the partition/vnode and vnode/physical node mappings referenced earlier. A gossip protocol is used internally to propagate and synchronize metadata.
This metadata is used by the coordinator to route read and write requests appropriately.
Requests are sent in parallel to the appropriate nodes but require only majority or quorum participation (by default).
The consistency level can be defined on a per bucket or per request basis.
This reflects the strong emphasis that Basho places on data guarantees and high availability.
Riak uses version vectors to allow for a conflict resolution policy other than Last Write Wins (which many view as data loss).
The write path in Riak KV is such that every update takes place (logically) with respect to a single vnode in the preflist for the object in question (known as a coordinated PUT).
The vnodes are the actors in the vector.
If, during a read, Riak encounters divergent values for an object, the version vectors can be used to determine whether the one version dominates the other (as above, allowing the system to discard the other version) or whether the versions are concurrent (in which case sibling objects will be maintained).
Riak KV also supports an alternative bucket type .. know as the “write once” bucket type.
Write once buckets are intended for immutable data, and use a simplified write path, avoiding the performance hit associated with coordinated PUTs and causality tracking with version vectors.
The existence of the “write once” write path is significant with respect to RiakTS.
Bitcask and LevelDB are the two standard options.
Bitcask is a log-structured hash table that keeps all its keys in memory. It has a very consistent performance profile and supports TTL-based expiry.
LevelDB is a log-structured merge tree that keeps data sorted, uses a commit log and buffers writes in memory.
LevelDB is similar to what Cassandra uses.
An in-memory backend also exists but is not widely used, and you can also use the backends in combination.
Replication in Riak is asynchronous, between a primary and a secondary cluster.
This can make for a much easier operational scenario as compared to a single cluster spanning regions.
Flexible topologies are supported: fully meshed, hub-and-spoke, active/active, active/passive, etc..
You can also do things like replicate to a secondary local cluster used specifically for snapshot backups or analytics.
Version vector support makes Riak KV an especially good fit for mutable data use cases where conflicts can arise and avoiding data loss is important. (Documents, JSON, metadata, sessions, user profiles, product data etc..)
Key/value semantics and document style data modeling are also a natural fit for use cases like transaction histories or archives, which are often immutable, but don’t typically require ad-hoc or range based queries. Using a pure key/value store avoids the need to define and operate within the context of a schema.
Riak Search allows you to attach a Solr style schema to a KV bucket.
Writes into the bucket will be indexed into Solr running in a per node, embedded JVM.
Solr style queries (ad-hoc, full text, geospatial, etc..) are supported by the clients and the REST API.
Search queries are executed as distributed Solr queries under the hood.
Data types utilize version vectors to automatically merge object state during concurrent update situations or upon partition resolution.
This saves the developer from having to write application-side conflict resolution logic (the major downside of the version vector approach).
They also allow for atomic operations, avoiding the read-modify-write lifecycle between the client and the cluster.
In general, you can think of use cases as belonging to one of two broad categories: those involving mutable state and those that involve immutable or event based data.
Riak KV is highly optimized for mutable data (where data loss is of higher concern and key/value semantics are a natural fit).
Time series involves immutable data by definition.
Common use cases include:
Infrastructure monitoring and metrics. These often involve stacks stitched together with open source tools like Graphite.
Real-time analytics: click stream, page views, impressions, email opens
IoT / Sensor data: anything from smart meters to FitBits.
Financial use cases like tick data
Scientific observations: weather observation data is a good example
As a time series database, Riak TS has a subset of requirements that differs substantially from what Riak KV requires.
Event based use cases typically involve a high volume of writes (requiring high write throughput).
Range based queries are a key requirement and need to be efficient.
An easy to use query language with range and aggregation support.
The last four requirements in the list are inherited from the architecture itself which speaks to a major advantage that Riak TS has with respect to some of the our competitors. The maturity and ease-of-use of our underlying platform sets us apart.
Co-location and ordering of the primary data are essential to support range queries efficiently, without relying on expensive coverage queries over distributed secondary indexes. Existing support for LevelDB was beneficial here.
Tabular data modeling (coupled with a SQL-like query language) is a natural fit for the queryability required by time series use cases.
Riak TS uses a write path based on the Riak KV write-once PUT path.
A query subsystem was required to actually execute user-generated SQL queries against the underlying distributed dataset.
Standard data types are supported (varchar, boolean, timestamp, integer, double).
A PRIMARY KEY is specified, which includes a partition key and a local key.
The composite partition key includes a family id, a series id and a quantum function that takes three arguments (timestamp field, unit of time and a quantity of time).
Riak TS will co-locate data for a given series based on the range specified by the quantum function.
The local key indicates how the data should be ordered. In the current version, this is required to be the same three fields in the same order. The next release will loosen this restriction.
A complete primary key must be specified.
Queries are currently limited to a single series.
Filtering on secondary fields is supported.
Aggregation functions and arithmetic are supported.
A hypothetical use case that uses Riak TS in conjunction with Riak KV.
Note that running Riak KV and Riak TS on the same cluster in production has yet to be fully tested.
Untappd is a social app for beer connoisseurs that allows you to checkin, post, comment etc.. about the beers that you drink.
A ‘checkin’ is the act of recording your current activity (beer, location, comment, photo etc..) through the mobile app.
Uncorkd is a backend for a hypothetical wine-centric social app modeled after UNTAPPD.
It uses Riak KV for state-based object/document storage and Riak TS for event-based storage.
The design goals are to (see the slide)
Riak Python Client
Public Github repo (rcgenova/riak-ts-demo)
Riak TS Open Source due mid-April
KV will be the source of truth for User, Wine and Location entity data.
Lat/long data will be retrieved from location objects upon checkin.
Individual wine varieties are the entities of interest (rather than the locations), so wine variety was used as the series id.
Activities are user-specific and therefore use the user as the series id.
Included a ‘type’ field in the Activity table to accommodate both the user’s checkins and their friends’ checkins.
Lat/long values are included to enable simple, bounding box based geospatial filtering.
A relatively low data volume allows for a wider time quantization; allows for wider queries with a minimum of sub-queries.
Generated 1000 fictitious users keyed by email address
35 wines from Sonoma County (where I live!), 2015 vintage
11 actual locations in San Francisco (from Yelp)
The algorithm attempts a checkin per minute. The probability distribution provides a means of simulating the times of day that are likely to be more active for such an activity.
When a checkin is triggered, a random user, location and wine are passed to the Checkin object’s checkin() function (shown on the next slide).
Checkin() inserts the event and fans it out to the friends activity feeds.
The Activity table is batch updated.
Location statistics counters are also updated.
This query uses a lat/long-based bounding box that approximately represents the SF Mission District.
The time range is one week 2016-01-01 to 2016-01-08
We can retrieve a user’s friend list with a single lookup (shown on the left).
On the right we are querying the activity feed for the user (friends only)
As mentioned, location statistics are updated on write using counters.
The counters are nested within a per location Map object.
We can retrieve the values of all counters with a single lookup.