SlideShare une entreprise Scribd logo
1  sur  35
Riak TS: Basho’s New
Purpose-built Time
Series Database
Rob Genova, Solution Architect
Basho Technologies
RIAK DEPLOYED
WORLDWIDE
Riak KV
Dynamo Inspired
5
Data Distribution
Basho Technologies | 6
Masterless Architecture
Riak has a masterless architecture in which every node in a cluster is capable of serving
read and write requests. The benefits of a masterless architecture include:
Basho Technologies | 7
Data Replication & Consistency
Reads and writes use quorum level consistency by default.
Basho Technologies | 8
put(“bucket/key”)
put(“bucket/key”)
High Availability
9
If a node goes offline, “fallback” virtual nodes will take over and automatically begin
serving requests on behalf of the downed virtual nodes. Control and data are
automatically handed back to the original node when it returns.
Data Guarantees
Version vectors are used to maintain an actor-based accounting of updates to
an object in Riak. This allows the system to reason about causality in the event
that multiple versions of an object exist at any given point in time.
Version 1 Version 2
v1 v2 v3v1 v2 v3
{v1:2,v2:3,v3:2} {v1:2,v2:3,v3:1}
(dominates)
Write once buckets
• Riak 2.1 introduced the
concept of "write once"
buckets
• 107% increase in
throughput vs standard
buckets
• Intended for immutable
data
Pluggable Storage Backends
Basho Technologies | 12
Pluggable storage backends enable you to choose the low-level storage engine that best
fits your use case.
• Bitcask
Basho’s open source key/value store and Riak’s default backend.
• LevelDB
Google’s open source key/value store
• In Memory
Uses Erlang’s ets tables to store data in memory
• Multi-Backend
Select the right backend for each use case on a per bucket basis
Riak automatically replicates
between clusters
• Configurable number of remote
replicas
• Options for real-time sync and full
sync
Geo-Data Locality allows
localized data processing
• Reduced latency to
end-users
• Allows sub 5ms responses
• Active-Active ensures
continuous user experience
Availability Across Geographies
Multi-cluster Replication
13
Riak KV: Use Cases
• Mutable data
• Documents, JSON, metadata
• Session state
• User/customer data
• Transaction histories
• Archives
Basho Technologies | 14
Riak KV: Search
• Right it like Riak, read it like
SOLR
• Riak Search communicates and
monitors the Solr OS process
• Riak Search listens for changes
in key/value (KV) data and
makes the appropriate changes
to Solr indexes
• Riak Search takes a user query
on any node and converts it to a
Solr distributed search
• Protocol Buffer interface and
Solr interface via HTTP
Riak Data Types are a developer-friendly way to avoid conflicting
versions of objects in an eventually consistent environment.
• Map
Supports the nesting of the Riak
Data Types.
• Register
A named binary field that can
only be used as part of a Map.
• Counter
Keeps tracks of increments and
decrements on an integer
• Flag
Values limited to enable or
disable
• Set
A collection of unique binary
values that supports add and
remove operations on one or
more values
Riak KV: Data Types
16
Riak TS
Riak TS: Use Cases
• Immutable data
• Infrastructure monitoring / metrics
• Real-time analytics
• IoT / Sensor Data
• Financial Data
• Scientific Observations
Basho Technologies | 18
Riak TS: Requirements & design goals
 High write throughput
 Efficient range query support
 Robust queryability
 Horizontal scale
 High availability
 Multi-region support
 Enterprise scale solution
Basho Technologies | 19
Riak TS: Design & Implementation
• Data distribution
– Data is co-located on a per series
basis for a configurable time
horizon
– A given series is partitioned into
ordered ranges of a configurable
size.
• Data modeling
– SQL-like data definition (bucket
parameterization)
• Read/write
– Efficient write path
– Query subsystem
– SQL-like query language
Basho Technologies | 20
Riak TS Implementation: Data definition
Basho Technologies | 21
Riak TS uses a SQL-like CREATE TABLE statement to associate a schema with a
bucket.
Riak TS Implementation: Query language
Basho Technologies | 22
Riak TS supports a SQL-like query language using the familiar semantics of the SELECT
statement.
SELECT weather, temperature FROM GeoCheckin WHERE myfamily =
'family1’ AND myseries = 'series1' AND time > 1449864277000 and time <
1449864290000 AND temperature > 27.0
SELECT AVG(temperature) FROM GeoCheckin WHERE myfamily = 'family1’
AND myseries = 'series1' AND time > 1449864277000 and time <
1449864290000
SELECT temperature * 1.5 FROM GeoCheckin WHERE myfamily = 'family1’
AND myseries = 'series1' AND time > 1449864277000 and time <
1449864290000
Riak TS: Write Performance
Basho Technologies | 23
• 130k writes per second
• 5 nodes (bare metal, Softlayer)
• 6-core + HT (12 logical cores)
• 32GB
• 800GB SSD x3 (RAID0)
• 1k objects
• 15-minute time quantization
• ring_size = 64
Riak TS: Unofficial Roadmap
Basho Technologies | 24
• As of 1.2 (this week)
– Query language with SELECT, filtering, aggregation functions
and arithmetic
– Java, Python, Erlang, Node, Ruby clients
– SQL Shell
• 1.3 (end of April-ish)
– OSS & Enterprise versions
– MDC (enterprise only)
– REST API
• Q2
– Bulk delete / expiry
– SQL GROUP BY, ORDER BY
– Visualization (Graphite/Grafana integration)
Use case:
UNCORKD
UNCORKD - Overview
Basho Technologies | 26
UNTAPPD for wine snobs! (And a tribute to the cool social/beer app)
• Tracks checkins by wine variety and location
• Maintains per-user friend lists and activity feeds
• Maintains per-location statistics
• Support Checkin, Activity feed & Location-based queries with
aggregation & filtering (time-based and geospatial)
UNCORKD – Riak KV Data Definition
Basho Technologies | 27
• User, Location & Wine entity data stored in standard KV buckets
UNCORKD – Riak KV Data Definition (2)
Basho Technologies | 28
• Friend lists and location statistics maintained via Set and Counter data
types
UNCORKD – Riak TS Data Definition
Basho Technologies | 29
• Wine name/id used as the series_id for Checkin
• User name/id used as the series_id for Activity
• Include lat/long data to support basic geospatial filtering
• 14-day time quantization
UNCORKD – Generate Checkins
Basho Technologies | 30
• Generates a months worth of checkins
• Attempts 1 checkin per minute with time-weighted probability
UNCORKD – Insert Checkin & Fan Out
Basho Technologies | 31
• Insert checkin
• Fan out to friends activity feeds
• Update per location statistics
UNCORKD – Query Checkins
Basho Technologies | 32
• Checkin count and average rating for ‘2015_Talisman_PinotNoir’
• List checkins with times and locations
UNCORKD – Query Checkins
Basho Technologies | 33
• List checkins for a given geographic area (Mission District)
UNCORKD – Query Activity Feed
Basho Technologies | 34
• List friends for user ‘AmyPhillips@gmail.com’
• Query activity feed
UNCORKD – Query Location stats
Basho Technologies | 35
• Per day checkin counts for ‘Etcetera_Wine_Bar’
?
Questions
36

Contenu connexe

Tendances

Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalDatabricks
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesSingleStore
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaDataWorks Summit/Hadoop Summit
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with SparkVincent GALOPIN
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsDataWorks Summit
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Databricks
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesScyllaDB
 
Acid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta LakeAcid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta LakeMichal Gancarski
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Data Con LA
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices ZalandoHayley
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Databricks
 

Tendances (20)

Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B Files
 
Acid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta LakeAcid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta Lake
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
 

Similaire à Riak TS

Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...DATAVERSITY
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexThomas Weise
 
Pydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen EtheridgePydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen EtheridgeEmmanuel Marchal
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Stephen Gordon
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxJohn Burwell
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzQAware GmbH
 
Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreAndy Gross
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Intro to InfluxDB
Intro to InfluxDBIntro to InfluxDB
Intro to InfluxDBInfluxData
 
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)Asher Feldman
 
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Aman Sinha
 

Similaire à Riak TS (20)

Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
Pydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen EtheridgePydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
 
Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak Core
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Intro to InfluxDB
Intro to InfluxDBIntro to InfluxDB
Intro to InfluxDB
 
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
 
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
 

Plus de clive boulton

Camlistore reprise at Google NYC
Camlistore reprise at Google NYCCamlistore reprise at Google NYC
Camlistore reprise at Google NYCclive boulton
 
Seattle Scalability meetup intro slides, Jan 22, 2014
Seattle Scalability meetup intro slides, Jan 22, 2014Seattle Scalability meetup intro slides, Jan 22, 2014
Seattle Scalability meetup intro slides, Jan 22, 2014clive boulton
 
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...clive boulton
 
Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013clive boulton
 
Seattle scalability meetup intro slides 24 july 2013
Seattle scalability meetup intro slides 24 july 2013Seattle scalability meetup intro slides 24 july 2013
Seattle scalability meetup intro slides 24 july 2013clive boulton
 
Seattle Scalability Meetup intro pptx - June 26
Seattle Scalability Meetup intro pptx - June 26Seattle Scalability Meetup intro pptx - June 26
Seattle Scalability Meetup intro pptx - June 26clive boulton
 
Seattle scalability meetup intro ppt May 22
Seattle scalability meetup intro ppt May 22Seattle scalability meetup intro ppt May 22
Seattle scalability meetup intro ppt May 22clive boulton
 
Patent Trollls gonna kill VRM?
Patent Trollls gonna kill VRM?Patent Trollls gonna kill VRM?
Patent Trollls gonna kill VRM?clive boulton
 
Seattle scalability meetup March 27,2013 intro slides
Seattle scalability meetup March 27,2013 intro slidesSeattle scalability meetup March 27,2013 intro slides
Seattle scalability meetup March 27,2013 intro slidesclive boulton
 
Seattle scalability meetup intro
Seattle scalability meetup introSeattle scalability meetup intro
Seattle scalability meetup introclive boulton
 
Seattle Scalability Meetup | Accumulo and WhitePages
Seattle Scalability Meetup | Accumulo and WhitePagesSeattle Scalability Meetup | Accumulo and WhitePages
Seattle Scalability Meetup | Accumulo and WhitePagesclive boulton
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetupclive boulton
 
Seattle montly hadoop nosql scalability meetup
Seattle montly hadoop nosql scalability meetupSeattle montly hadoop nosql scalability meetup
Seattle montly hadoop nosql scalability meetupclive boulton
 
Leapfrogging with legacy
Leapfrogging with legacyLeapfrogging with legacy
Leapfrogging with legacyclive boulton
 
Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. clive boulton
 
Whole Chain Traceability Consortium
Whole Chain Traceability ConsortiumWhole Chain Traceability Consortium
Whole Chain Traceability Consortiumclive boulton
 
Seattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / CassandraSeattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / Cassandraclive boulton
 

Plus de clive boulton (20)

Camlistore reprise at Google NYC
Camlistore reprise at Google NYCCamlistore reprise at Google NYC
Camlistore reprise at Google NYC
 
Ignitepii2014
Ignitepii2014Ignitepii2014
Ignitepii2014
 
Personal databank
Personal databankPersonal databank
Personal databank
 
Seattle Scalability meetup intro slides, Jan 22, 2014
Seattle Scalability meetup intro slides, Jan 22, 2014Seattle Scalability meetup intro slides, Jan 22, 2014
Seattle Scalability meetup intro slides, Jan 22, 2014
 
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
 
Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013
 
Seattle scalability meetup intro slides 24 july 2013
Seattle scalability meetup intro slides 24 july 2013Seattle scalability meetup intro slides 24 july 2013
Seattle scalability meetup intro slides 24 july 2013
 
Seattle Scalability Meetup intro pptx - June 26
Seattle Scalability Meetup intro pptx - June 26Seattle Scalability Meetup intro pptx - June 26
Seattle Scalability Meetup intro pptx - June 26
 
Seattle scalability meetup intro ppt May 22
Seattle scalability meetup intro ppt May 22Seattle scalability meetup intro ppt May 22
Seattle scalability meetup intro ppt May 22
 
Patent Trollls gonna kill VRM?
Patent Trollls gonna kill VRM?Patent Trollls gonna kill VRM?
Patent Trollls gonna kill VRM?
 
Seattle scalability meetup March 27,2013 intro slides
Seattle scalability meetup March 27,2013 intro slidesSeattle scalability meetup March 27,2013 intro slides
Seattle scalability meetup March 27,2013 intro slides
 
Seattle scalability meetup intro
Seattle scalability meetup introSeattle scalability meetup intro
Seattle scalability meetup intro
 
Seattle Scalability Meetup | Accumulo and WhitePages
Seattle Scalability Meetup | Accumulo and WhitePagesSeattle Scalability Meetup | Accumulo and WhitePages
Seattle Scalability Meetup | Accumulo and WhitePages
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetup
 
Seattle montly hadoop nosql scalability meetup
Seattle montly hadoop nosql scalability meetupSeattle montly hadoop nosql scalability meetup
Seattle montly hadoop nosql scalability meetup
 
Leapfrogging with legacy
Leapfrogging with legacyLeapfrogging with legacy
Leapfrogging with legacy
 
Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru.
 
Whole Chain Traceability Consortium
Whole Chain Traceability ConsortiumWhole Chain Traceability Consortium
Whole Chain Traceability Consortium
 
Seattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / CassandraSeattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / Cassandra
 
Wspm
WspmWspm
Wspm
 

Dernier

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Riak TS

  • 1. Riak TS: Basho’s New Purpose-built Time Series Database Rob Genova, Solution Architect Basho Technologies
  • 6. Masterless Architecture Riak has a masterless architecture in which every node in a cluster is capable of serving read and write requests. The benefits of a masterless architecture include: Basho Technologies | 7
  • 7. Data Replication & Consistency Reads and writes use quorum level consistency by default. Basho Technologies | 8 put(“bucket/key”)
  • 8. put(“bucket/key”) High Availability 9 If a node goes offline, “fallback” virtual nodes will take over and automatically begin serving requests on behalf of the downed virtual nodes. Control and data are automatically handed back to the original node when it returns.
  • 9. Data Guarantees Version vectors are used to maintain an actor-based accounting of updates to an object in Riak. This allows the system to reason about causality in the event that multiple versions of an object exist at any given point in time. Version 1 Version 2 v1 v2 v3v1 v2 v3 {v1:2,v2:3,v3:2} {v1:2,v2:3,v3:1} (dominates)
  • 10. Write once buckets • Riak 2.1 introduced the concept of "write once" buckets • 107% increase in throughput vs standard buckets • Intended for immutable data
  • 11. Pluggable Storage Backends Basho Technologies | 12 Pluggable storage backends enable you to choose the low-level storage engine that best fits your use case. • Bitcask Basho’s open source key/value store and Riak’s default backend. • LevelDB Google’s open source key/value store • In Memory Uses Erlang’s ets tables to store data in memory • Multi-Backend Select the right backend for each use case on a per bucket basis
  • 12. Riak automatically replicates between clusters • Configurable number of remote replicas • Options for real-time sync and full sync Geo-Data Locality allows localized data processing • Reduced latency to end-users • Allows sub 5ms responses • Active-Active ensures continuous user experience Availability Across Geographies Multi-cluster Replication 13
  • 13. Riak KV: Use Cases • Mutable data • Documents, JSON, metadata • Session state • User/customer data • Transaction histories • Archives Basho Technologies | 14
  • 14. Riak KV: Search • Right it like Riak, read it like SOLR • Riak Search communicates and monitors the Solr OS process • Riak Search listens for changes in key/value (KV) data and makes the appropriate changes to Solr indexes • Riak Search takes a user query on any node and converts it to a Solr distributed search • Protocol Buffer interface and Solr interface via HTTP
  • 15. Riak Data Types are a developer-friendly way to avoid conflicting versions of objects in an eventually consistent environment. • Map Supports the nesting of the Riak Data Types. • Register A named binary field that can only be used as part of a Map. • Counter Keeps tracks of increments and decrements on an integer • Flag Values limited to enable or disable • Set A collection of unique binary values that supports add and remove operations on one or more values Riak KV: Data Types 16
  • 17. Riak TS: Use Cases • Immutable data • Infrastructure monitoring / metrics • Real-time analytics • IoT / Sensor Data • Financial Data • Scientific Observations Basho Technologies | 18
  • 18. Riak TS: Requirements & design goals  High write throughput  Efficient range query support  Robust queryability  Horizontal scale  High availability  Multi-region support  Enterprise scale solution Basho Technologies | 19
  • 19. Riak TS: Design & Implementation • Data distribution – Data is co-located on a per series basis for a configurable time horizon – A given series is partitioned into ordered ranges of a configurable size. • Data modeling – SQL-like data definition (bucket parameterization) • Read/write – Efficient write path – Query subsystem – SQL-like query language Basho Technologies | 20
  • 20. Riak TS Implementation: Data definition Basho Technologies | 21 Riak TS uses a SQL-like CREATE TABLE statement to associate a schema with a bucket.
  • 21. Riak TS Implementation: Query language Basho Technologies | 22 Riak TS supports a SQL-like query language using the familiar semantics of the SELECT statement. SELECT weather, temperature FROM GeoCheckin WHERE myfamily = 'family1’ AND myseries = 'series1' AND time > 1449864277000 and time < 1449864290000 AND temperature > 27.0 SELECT AVG(temperature) FROM GeoCheckin WHERE myfamily = 'family1’ AND myseries = 'series1' AND time > 1449864277000 and time < 1449864290000 SELECT temperature * 1.5 FROM GeoCheckin WHERE myfamily = 'family1’ AND myseries = 'series1' AND time > 1449864277000 and time < 1449864290000
  • 22. Riak TS: Write Performance Basho Technologies | 23 • 130k writes per second • 5 nodes (bare metal, Softlayer) • 6-core + HT (12 logical cores) • 32GB • 800GB SSD x3 (RAID0) • 1k objects • 15-minute time quantization • ring_size = 64
  • 23. Riak TS: Unofficial Roadmap Basho Technologies | 24 • As of 1.2 (this week) – Query language with SELECT, filtering, aggregation functions and arithmetic – Java, Python, Erlang, Node, Ruby clients – SQL Shell • 1.3 (end of April-ish) – OSS & Enterprise versions – MDC (enterprise only) – REST API • Q2 – Bulk delete / expiry – SQL GROUP BY, ORDER BY – Visualization (Graphite/Grafana integration)
  • 25. UNCORKD - Overview Basho Technologies | 26 UNTAPPD for wine snobs! (And a tribute to the cool social/beer app) • Tracks checkins by wine variety and location • Maintains per-user friend lists and activity feeds • Maintains per-location statistics • Support Checkin, Activity feed & Location-based queries with aggregation & filtering (time-based and geospatial)
  • 26. UNCORKD – Riak KV Data Definition Basho Technologies | 27 • User, Location & Wine entity data stored in standard KV buckets
  • 27. UNCORKD – Riak KV Data Definition (2) Basho Technologies | 28 • Friend lists and location statistics maintained via Set and Counter data types
  • 28. UNCORKD – Riak TS Data Definition Basho Technologies | 29 • Wine name/id used as the series_id for Checkin • User name/id used as the series_id for Activity • Include lat/long data to support basic geospatial filtering • 14-day time quantization
  • 29. UNCORKD – Generate Checkins Basho Technologies | 30 • Generates a months worth of checkins • Attempts 1 checkin per minute with time-weighted probability
  • 30. UNCORKD – Insert Checkin & Fan Out Basho Technologies | 31 • Insert checkin • Fan out to friends activity feeds • Update per location statistics
  • 31. UNCORKD – Query Checkins Basho Technologies | 32 • Checkin count and average rating for ‘2015_Talisman_PinotNoir’ • List checkins with times and locations
  • 32. UNCORKD – Query Checkins Basho Technologies | 33 • List checkins for a given geographic area (Mission District)
  • 33. UNCORKD – Query Activity Feed Basho Technologies | 34 • List friends for user ‘AmyPhillips@gmail.com’ • Query activity feed
  • 34. UNCORKD – Query Location stats Basho Technologies | 35 • Per day checkin counts for ‘Etcetera_Wine_Bar’

Notes de l'éditeur

  1. At Basho, we are well known for Riak KV: our flagship database and key value store that’s been around for quite a while. Riak TS is both a natural extension of our existing architecture as well as a nice complement to the functionality offered by Riak KV. Agenda: 1) Riak KV overview, 2) Riak TS intro, 3) use case walkthrough.
  2. For anybody unfamiliar, Riak is an eventually consistent Dynamo Storage System inspired by Amazon’s seminal white paper. It’s an AP system in CAP terms. The business value of such systems is all about high availability, horizontal (predictable) scalability, high concurrency/throughput and low cost. Riak adds to this the highest possible data guarantees, excellent multi-region/datacenter support, resilience/stability and ease-of-use. “High availability without data loss” “Reliability at Scale”
  3. Riak uses a SHA-1 hash-based approach to data distribution. The SHA-1 hash integer range is logically subdivided into a number of partitions. We call this the ring. The default ring size is 64. The logical partitions are each mapped to three virtual nodes (3 is the default and only very rarely changed). The SHA-1 hash of the bucket and key name combination is used to route a given object to the appropriate vnodes, based on the mapping. Vnodes are, in turn, distributed evenly across the available nodes in the cluster.
  4. Riak uses a masterless architecture. Requests are routed (typically by a load balancer) to any one of the nodes in the cluster, that initial node becoming the coordinator for the request.
  5. Coordinators maintain internal metadata about the partition/vnode and vnode/physical node mappings referenced earlier. A gossip protocol is used internally to propagate and synchronize metadata. This metadata is used by the coordinator to route read and write requests appropriately. Requests are sent in parallel to the appropriate nodes but require only majority or quorum participation (by default). The consistency level can be defined on a per bucket or per request basis.
  6. This reflects the strong emphasis that Basho places on data guarantees and high availability.
  7. Riak uses version vectors to allow for a conflict resolution policy other than Last Write Wins (which many view as data loss). The write path in Riak KV is such that every update takes place (logically) with respect to a single vnode in the preflist for the object in question (known as a coordinated PUT). The vnodes are the actors in the vector. If, during a read, Riak encounters divergent values for an object, the version vectors can be used to determine whether the one version dominates the other (as above, allowing the system to discard the other version) or whether the versions are concurrent (in which case sibling objects will be maintained).
  8. Riak KV also supports an alternative bucket type .. know as the “write once” bucket type. Write once buckets are intended for immutable data, and use a simplified write path, avoiding the performance hit associated with coordinated PUTs and causality tracking with version vectors. The existence of the “write once” write path is significant with respect to RiakTS.
  9. Bitcask and LevelDB are the two standard options. Bitcask is a log-structured hash table that keeps all its keys in memory. It has a very consistent performance profile and supports TTL-based expiry. LevelDB is a log-structured merge tree that keeps data sorted, uses a commit log and buffers writes in memory. LevelDB is similar to what Cassandra uses. An in-memory backend also exists but is not widely used, and you can also use the backends in combination.
  10. Replication in Riak is asynchronous, between a primary and a secondary cluster. This can make for a much easier operational scenario as compared to a single cluster spanning regions. Flexible topologies are supported: fully meshed, hub-and-spoke, active/active, active/passive, etc.. You can also do things like replicate to a secondary local cluster used specifically for snapshot backups or analytics.
  11. Version vector support makes Riak KV an especially good fit for mutable data use cases where conflicts can arise and avoiding data loss is important. (Documents, JSON, metadata, sessions, user profiles, product data etc..) Key/value semantics and document style data modeling are also a natural fit for use cases like transaction histories or archives, which are often immutable, but don’t typically require ad-hoc or range based queries. Using a pure key/value store avoids the need to define and operate within the context of a schema.
  12. Riak Search allows you to attach a Solr style schema to a KV bucket. Writes into the bucket will be indexed into Solr running in a per node, embedded JVM. Solr style queries (ad-hoc, full text, geospatial, etc..) are supported by the clients and the REST API. Search queries are executed as distributed Solr queries under the hood.
  13. Data types utilize version vectors to automatically merge object state during concurrent update situations or upon partition resolution. This saves the developer from having to write application-side conflict resolution logic (the major downside of the version vector approach). They also allow for atomic operations, avoiding the read-modify-write lifecycle between the client and the cluster.
  14. In general, you can think of use cases as belonging to one of two broad categories: those involving mutable state and those that involve immutable or event based data. Riak KV is highly optimized for mutable data (where data loss is of higher concern and key/value semantics are a natural fit). Time series involves immutable data by definition. Common use cases include: Infrastructure monitoring and metrics. These often involve stacks stitched together with open source tools like Graphite. Real-time analytics: click stream, page views, impressions, email opens IoT / Sensor data: anything from smart meters to FitBits. Financial use cases like tick data Scientific observations: weather observation data is a good example
  15. As a time series database, Riak TS has a subset of requirements that differs substantially from what Riak KV requires. Event based use cases typically involve a high volume of writes (requiring high write throughput). Range based queries are a key requirement and need to be efficient. An easy to use query language with range and aggregation support. The last four requirements in the list are inherited from the architecture itself which speaks to a major advantage that Riak TS has with respect to some of the our competitors. The maturity and ease-of-use of our underlying platform sets us apart.
  16. Co-location and ordering of the primary data are essential to support range queries efficiently, without relying on expensive coverage queries over distributed secondary indexes. Existing support for LevelDB was beneficial here. Tabular data modeling (coupled with a SQL-like query language) is a natural fit for the queryability required by time series use cases. Riak TS uses a write path based on the Riak KV write-once PUT path. A query subsystem was required to actually execute user-generated SQL queries against the underlying distributed dataset.
  17. Standard data types are supported (varchar, boolean, timestamp, integer, double). A PRIMARY KEY is specified, which includes a partition key and a local key. The composite partition key includes a family id, a series id and a quantum function that takes three arguments (timestamp field, unit of time and a quantity of time). Riak TS will co-locate data for a given series based on the range specified by the quantum function. The local key indicates how the data should be ordered. In the current version, this is required to be the same three fields in the same order. The next release will loosen this restriction.
  18. A complete primary key must be specified. Queries are currently limited to a single series. Filtering on secondary fields is supported. Aggregation functions and arithmetic are supported.
  19. A hypothetical use case that uses Riak TS in conjunction with Riak KV. Note that running Riak KV and Riak TS on the same cluster in production has yet to be fully tested.
  20. Untappd is a social app for beer connoisseurs that allows you to checkin, post, comment etc.. about the beers that you drink. A ‘checkin’ is the act of recording your current activity (beer, location, comment, photo etc..) through the mobile app. Uncorkd is a backend for a hypothetical wine-centric social app modeled after UNTAPPD. It uses Riak KV for state-based object/document storage and Riak TS for event-based storage. The design goals are to (see the slide) Riak Python Client Public Github repo (rcgenova/riak-ts-demo) Riak TS Open Source due mid-April
  21. KV will be the source of truth for User, Wine and Location entity data. Lat/long data will be retrieved from location objects upon checkin.
  22. Individual wine varieties are the entities of interest (rather than the locations), so wine variety was used as the series id. Activities are user-specific and therefore use the user as the series id. Included a ‘type’ field in the Activity table to accommodate both the user’s checkins and their friends’ checkins. Lat/long values are included to enable simple, bounding box based geospatial filtering. A relatively low data volume allows for a wider time quantization; allows for wider queries with a minimum of sub-queries.
  23. Generated 1000 fictitious users keyed by email address 35 wines from Sonoma County (where I live!), 2015 vintage 11 actual locations in San Francisco (from Yelp) The algorithm attempts a checkin per minute. The probability distribution provides a means of simulating the times of day that are likely to be more active for such an activity. When a checkin is triggered, a random user, location and wine are passed to the Checkin object’s checkin() function (shown on the next slide).
  24. Checkin() inserts the event and fans it out to the friends activity feeds. The Activity table is batch updated. Location statistics counters are also updated.
  25. This query uses a lat/long-based bounding box that approximately represents the SF Mission District. The time range is one week 2016-01-01 to 2016-01-08
  26. We can retrieve a user’s friend list with a single lookup (shown on the left). On the right we are querying the activity feed for the user (friends only)
  27. As mentioned, location statistics are updated on write using counters. The counters are nested within a per location Map object. We can retrieve the values of all counters with a single lookup.