More Related Content Similar to A brave new world in mutable big data relational storage (Strata NYC 2017) (20) A brave new world in mutable big data relational storage (Strata NYC 2017)1. 1© Cloudera, Inc. All rights reserved.
A brave new world in mutable big data:
Relational storage
Todd Lipcon
Software Engineer at Cloudera
Apache Kudu founder and PMC chair
3. 3© Cloudera, Inc. All rights reserved.
About me
• Engineer at Cloudera since 2009
• Hadoop core (HDFS, MR1)
• HBase stability and performance
• Started Kudu project in 2012 (bias alert!)
• My 9th Strata NYC!
Feel free to tweet questions @tlipcon or find me on the Kudu Slack
4. 4© Cloudera, Inc. All rights reserved.
A brief history of databases
Incomplete,distilled, and semi-accurate
6. 6© Cloudera, Inc. All rights reserved.
“A database system where an application developer directly uses an
application programming interface to search indexes in order to locate
records in data files.” - Wikipedia “ISAM”
• Files contain records (originally fixed-length, later variable-length)
• Files stored on disks and applications directly access them (file-system locking)
• Later added networked access (client-server model), hierarchical records
• Still a simple API:
• Seek by key, Read, Write, Insert, Delete
1960s, 1970s: ISAM / VSAM
7. 7© Cloudera, Inc. All rights reserved.
Probably the only slide at Strata with COBOL on it
Source: http://www.mainframes360.com/2010/03/ksds-files-random-processing.html
8. 8© Cloudera, Inc. All rights reserved.
Failings of ISAM/VSAM
A Relational Model of Data for Large Shared Data Banks (Codd, 1972)
• Applications and physical data layout are too tightly coupled
• e.g a database of parts might be originally ordered by part number and
later changed
• inventory app inadvertently depends on order (unexpected breaks)
• Hard to make general-purpose programs that run against ISAM/VSAM
datasets
• Proposed a new model: relational databases
• All entities modeled by peer tables with relationships between them
• Programs use declarative access (DB decides physical operations necessary)
9. 9© Cloudera, Inc. All rights reserved.
Origins of SQL (1974)
• Originally SEQUEL (Structured English QUEry Language)
• Renamed to SQL due to trademark issues
• Designed to be easy to write, read, and maintain
• “is intended for users who are more comfortable with an English-keyword
format than with the terse mathematical notation of SQUARE.”
• Solves the coupling issue:
• Application: specify what should be returned
• Database: figure out how to return it
10. 10© Cloudera, Inc. All rights reserved.
Explosion of SQL Popularity
• IBM, Oracle, Microsoft, Informix, and others joined the party
• ANSI standard in 1986
• Ecosystem growth:
• Business Intelligence tools
• Object-Relational Mappers
• Extract-Transform-Load tools (ETL)
• Open source SQL databases
• mSQL, MySQL, PostgreSQL, etc
• LAMP stack
20. 20© Cloudera, Inc. All rights reserved.
NoSQL search interest over time
What happened in
Jan 2012???
21. 21© Cloudera, Inc. All rights reserved.
NoSQL complaints
• Tool compatibility? BI? ETL? ORMs?
• Consistency
• denormalization is tough
• hard to program against weak semantics
• Access path sensitivity
• Have to tightly couple applications with
physical data model
• No ad-hoc access
• Complex application code to perform
simple aggregations
Some of these
critiques sound
awfully familiar...
1970s Database People
23. 23© Cloudera, Inc. All rights reserved.
Not-Only SQL
People wanted their SQL back, and NoSQL
developers gave it!
• Cassandra - CQL (late 2011)
• HBase - Phoenix (Jan 2013)
• HDFS - Hive (2009), Impala (2012), Drill (2012),
Spark SQL (2014), Presto (2013)
24. 24© Cloudera, Inc. All rights reserved.
Meanwhile in RDBMS land
Original complaints still relevant?
Most OLTP apps fit in 1TB
of RAM and flash!
Shared-nothing OLAP available
and works well now
Maybe NoSQL and SQL have
converged?
25. 25© Cloudera, Inc. All rights reserved.
“It is perhaps fair to say that from the perspective of many
engineers working on the Google infrastructure, the SQL vs.
NoSQL dichotomy may no longer be relevant.”
Source: “Spanner: Becoming a SQL System”
27. 27© Cloudera, Inc. All rights reserved.
What kind of application?
• OLTP? OLAP? HTAP (Hybrid Transactional/Analytic Processing)
• Next-gen data apps are all hybrid (streaming ingest, constant analytics)
• “Combining OLTP, OLAP, and full-text search capabilities in a single system
remains at the top of customer priorities.” - Spanner: Becoming a SQL System
28. 28© Cloudera, Inc. All rights reserved.
HTAP Application Architecture
• Realtime ingest (high performance writes)
• Throughput and latency both important
• Concurrent SQL reads
• BI apps demand interactive performance
• Often a time-series component
• IoT, transaction data, click logs, etc.
• High Availability/Geo-redundancy
Browser tracing Web logs
Kafka
Kudu
Impala
JDBC access
Marketing Dept.
Developers
Web-app
29. 29© Cloudera, Inc. All rights reserved.
Evaluating an HTAP Data Store
• SQL support
• Semantics (eventual vs strict consistency, transactional support, features)
• Performance (ingest with concurrent analytics)
• Availability (multi-datacenter)
• Deployment Model
• Cost
30. 30© Cloudera, Inc. All rights reserved.
Original usecase Deployment Semantics
HBase Web indexing Anywhere single-row ACID
Cassandra OLTP (web serving) Anywhere eventual
Cloud Spanner OLTP SaaS-only (GCE) full ACID
HDFS OLAP Physical HW bulk access only
Kudu HTAP Anywhere single-row ACID
Narrowing the options
Similar storage
implementations (SSTable,
Log-Structured-Merge)
Let’s compare with
Spanner since it’s shiny,
new, and similar to Kudu!
Only store originally
designed for HTAP
31. 31© Cloudera, Inc. All rights reserved.
Not-Only-SQL in Depth:
Comparing Cloud Spanner and Kudu+Impala
32. 32© Cloudera, Inc. All rights reserved.
Apache Kudu: Scalable and fast tabular storage
Scalable
• Tested up to 275 nodes (~3PB cluster)
• Designed to scale to 1000s of nodes and tens of PBs
Fast
• Millions of read/write operations per second across cluster
• Multiple GB/second read throughput per node
Tabular
• Represents data in structured tables like a relational database
•Strict schema, finite column count, no BLOBs
• Individual record-level access to 100+ billion row tables
34. 34© Cloudera, Inc. All rights reserved.
Kudu vs Spanner: Consistency and Availability
Kudu Spanner Winner?
Concurrency
control
MVCC (with
HybridTime)
MVCC (with
TrueTime)
Spanner (but
needs atomic
clock hardware!)
Read-only
(analytic)
queries
Consistent
Snapshot
Isolation
Consistent
Snapshot
Isolation
Tie
Transactions Single-row ACID Multi-row ACID
(small sets of
rows only)
Spanner
Availability/
Replication
Replicated log
(Raft, 3 replicas)
Replicated log
(Multi-Paxos, 3
replicas)
Tie
35. 35© Cloudera, Inc. All rights reserved.
Kudu vs Spanner: Data Access
Kudu Spanner Winner?
Programmatic
APIs
Java, C++,
Python
C#, Go, Java,
Node, PHP,
Python, Ruby
Spanner
Secondary
Indexes
no supported Spanner
SQL via Impala or
Spark (SQL 2003
w/ Analytic
extensions)
Built-in (simple
ANSI99 queries
only, no write
support)
Kudu
Ecosystem
Integrations
Spark, Impala,
Flume, Apex,
StreamSets, et al.
?? (very limited) Kudu
36. 36© Cloudera, Inc. All rights reserved.
Kudu Spanner Winner?
Partitioning Hash or range,
explicit
Range only
(automatic)
<it depends>
Load balancing manual automatic Spanner
Deployment
Environment
on-prem or cloud SaaS only (lock-
in)
Kudu
Ops model operate yourself SaaS (no ops) Spanner
Licensing Apache License closed source Kudu
Kudu vs Spanner: operational factors
37. 37© Cloudera, Inc. All rights reserved.
Checkpoint so far
• Systems are really pretty similar
• No accident - Kudu’s replication, partitioning, and data model inherit a lot
from Spanner
• Current feature gaps
• Spanner ahead on transactional feature set (OLTP focus)
• Kudu ahead on analytic feature set (OLAP focus)
What about underlying storage and performance?
38. 38© Cloudera, Inc. All rights reserved.
Spanner Storage - SSTable / Log-Structured Merge
• SSTable (sorted-string table)
• same storage format as BigTable (inherited code)
• row-oriented design
• Each row <cola, colb, colc, ...> stored on disk in that format
• Optimal for OLTP (read 1 row = 1 disk seek)
• Inefficient for OLAP (high CPU on scans)
• not schema-aware
• little opportunity for type-specific compression techniques, etc.
“SSTables have proven to be remarkably robust even when used for schematized
data consisting largely of small values, often traversed by column. But they are
ultimately a poor fit and leave a lot of performance on the table.”
39. 39© Cloudera, Inc. All rights reserved.
base columnar data
Kudu Storage - Columnar + Deltas
• Stores most of its data in an internal columnar format
• Each column stored, encoded, and compressed separately, in small chunks
• Similar to Parquet, with enhancements:
• Indexes allow fast seeking by key or by position (for low-latency read)
• Delta Stores allow tracking of updated and deleted rows
c1 c2 c3 c4 +
deltas (recently changed rows)
d1 d2
c1 c2 c3 c4
1 hi 0.1 N
3 bye 0.2 N
2 cat 0.1 N
1 dog 0.5 Y
read-time
40. 40© Cloudera, Inc. All rights reserved.
So how much does it really
matter?
Analytics benchmarks
41. 41© Cloudera, Inc. All rights reserved.
Benchmark setup
Cloud Spanner
5 “nodes” (unknown specs)
us-central1 region (multi-zone)
Price:
$0.90/node/hr * 5 nodes
= $3240/month
Kudu on GCE
5 n1-standard-16 (16vCPU, 60G RAM)
us-central1 region (multi-zone)
500G Persistent SSD disk each
Price:
$0.54/node/hr * 5 +
500GB * $0.17/GB/mo * 5
= $2366.80/month
*drops to $1009 if preemptible is used!
* factoring in sustained-use discount
30%
Lower!
42. 42© Cloudera, Inc. All rights reserved.
Test 1: TPCH Data Loading
• Used a separate node to load the TPC-H “LINEITEM” table
• 600M rows, 75GB in CSV format
• Multi-threaded Java program* to load, followed best practices
*Loader available at https://github.com/toddlipcon/spanner-kudu-comparison
43. 43© Cloudera, Inc. All rights reserved.
Test 2: TPCH Queries
• SELECT COUNT(*)
• TPCH Q1, Q6: simple GROUP BY/SUM/COUNT which scan the whole table
44. 44© Cloudera, Inc. All rights reserved.
Test 3: YCSB Loading
• Standard YCSB benchmark
• Configured as recommended in the
cloudspanner/README file
• Experienced many errors, timeouts,
and multi-minute stalls loading
spanner
• eventually succeeded on third try
• so take these results with a grain
of salt!
47. 47© Cloudera, Inc. All rights reserved.
YCSB Workload A (50/50 read/write mix)
Kudu is not optimized for high update-rate scenarios. See KUDU-749
48. 48© Cloudera, Inc. All rights reserved.
Benchmark summary
• Kudu ingests data at least 4x faster
• Stability issues with Cloud Spanner ingestion (cause unknown)
• Kudu performs simple analytic queries 10-100x faster
• Spanner wins on high-percentile tail latencies
• Kudu performance degrades significantly in 50/50 R/W mix workload
• Reminders:
• Kudu cluster has 30% lower cost, and can be run on any provider!
• Kudu doesn’t have the same rich OLTP feature set as Spanner (indexes,
multi-row transactions, etc)
50. 50© Cloudera, Inc. All rights reserved.
Conclusions
• NoSQL and SQL are converging again
• We now get “best of both worlds” from both communities!
• Many different excellent choices are now available for building hybrid
transactional/analytic applications
• Understand the trade-offs before settling on an architecture
• Seemingly small details can make orders-of-magnitude difference
• Consider non-functional differences as well (licensing, deployment, lock-in,
etc)
51. 51© Cloudera, Inc. All rights reserved.
Acknowledgements
• Spanner team for publishing papers, especially SIGMOD 2017 (“Spanner:
Becoming a SQL System”)
• Cloud Spanner team and developer advocates (Deepti Srivastava, Robert
Kubis)
• Siamak Tazari (YCSB binding for Cloud Spanner)
• Cloudera (paying my GCE bill)