SlideShare une entreprise Scribd logo
1  sur  102
Télécharger pour lire hors ligne
Introduction to Cassandra 
DuyHai DOAN, Technical Advocate
@doanduyhai 
About me! 
2 
Duy Hai DOAN 
Cassandra technical advocate 
• talks, meetups, confs 
• open-source devs (Achilles, …) 
• OSS technical point of contact in Europe 
☞ duy_hai.doan@datastax.com 
• production troubleshooting
@doanduyhai 3 
Datastax! 
• Founded in April 2010 
• We drive Apache Cassandra™ 
• Home to Cassandra chair & most committers (≈80%) 
• Headquarter in San Francisco Bay area 
• EU headquarter in London, offices in France and Germany 
• Datastax Enterprise = OSS Cassandra + features
@doanduyhai 
Agenda! 
4 
• Architecture 
• Replication 
• Consistency 
• Read/write path 
• Data Model 
• Failure handling
What is Cassandra!
@doanduyhai 
Cassandra history! 
6 
NoSQL database 
• created at Facebook 
• open-sourced since 2008 
• current version = 2.1 
• column-oriented ☞ distributed table
Cassandra 5 key facts! 
@doanduyhai 
7 
Linear scalability 
Small & « huge » scale 
• 3 à1k+ nodes cluster 
• 3Gb à Pb+
Cassandra 5 key facts! 
@doanduyhai 
8 
Continuous availability (≈100% up-time) 
• resilient architecture (Dynamo) 
• rolling upgrades 
• data backward compatible n/n+1 versions
Cassandra 5 key facts! 
@doanduyhai 
9 
Multi-data centers 
• out-of-the-box (config only) 
• AWS conf for multi-region DCs 
• GCE/CloudStack support 
• resilience, work-load segregation 
• virtual data-centers
Cassandra 5 key facts! 
@doanduyhai 
10 
Operational simplicity 
• 1 node = 1 process + 1 config file 
• deployment automation 
• OpsCenter for monitoring
Cassandra 5 key facts! 
@doanduyhai 
11 
Analytics combo 
• Cassandra + Spark = awesome ! 
• realtime streaming
Cassandra architecture!
Cassandra architecture! 
@doanduyhai 
13 
API (CQL & RPC) 
CLUSTER (DYNAMODB) 
DATA STORE (BIG TABLES) 
DISKS 
Node1 
Client request 
API (CQL & RPC) 
CLUSTER (DYNAMODB) 
DATA STORE (BIG TABLES) 
DISKS 
Node2
@doanduyhai 
Data distribution! 
14 
Random: hash of #partition → token = hash(#p) 
Hash: ]-X, X] 
X = huge number (264/2) 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8
@doanduyhai 
Token Ranges! 
15 
A: ]0, X/8] 
B: ] X/8, 2X/8] 
C: ] 2X/8, 3X/8] 
D: ] 3X/8, 4X/8] 
E: ] 4X/8, 5X/8] 
F: ] 5X/8, 6X/8] 
G: ] 6X/8, 7X/8] 
H: ] 7X/8, X] 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
A 
B 
C 
D 
E 
F 
G 
H
8 nodes 10 nodes 
@doanduyhai 
Linear scalability! 
16 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
n1 
n2 
n3 n4 
n5 
n6 
n7 
n9 n8 
n10
! " 
! 
Q & R
Replication & Consistency Model!
Failure tolerance by replication! 
@doanduyhai 
19 
Replication Factor (RF) = 3 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
{B, A, H} 
{C, B, A} 
{D, C, B} 
A 
B 
C 
D 
E 
F 
G 
H
@doanduyhai 
Coordinator node! 
Incoming requests (read/write) 
Coordinator node handles the request 
n1 
Every node can be coordinator àmasterless 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
Client request
@doanduyhai 
Consistency! 
21 
Tunable at runtime 
• ONE 
• QUORUM (strict majority w.r.t. RF) 
• ALL 
Apply both to read & write
Write consistency! 
Write ONE 
• write request to all replicas in // 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Write consistency! 
Write ONE 
• write request to all replicas in // 
• wait for ONE ack before returning to 
@doanduyhai 
client 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs
Write consistency! 
Write ONE 
• write request to all replicas in // 
• wait for ONE ack before returning to 
@doanduyhai 
client 
• other acks later, asynchronously 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs 
10 μs 
120 μs
Write consistency! 
Write QUORUM 
• write request to all replicas in // 
• wait for QUORUM acks before 
@doanduyhai 
returning to client 
• other acks later, asynchronously 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs 
10 μs 
120 μs
Read consistency! 
Read ONE 
• read from one node among all replicas 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read ONE 
• read from one node among all replicas 
• contact the fastest node (stats) 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
@doanduyhai 
Read consistency! 
Read QUORUM 
• read from one fastest node 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
@doanduyhai 
replicas to reach QUORUM 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
@doanduyhai 
replicas to reach QUORUM 
• return most up-to-date data to client 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
@doanduyhai 
replicas to reach QUORUM 
• return most up-to-date data to client 
• repair if digest mismatch n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Consistency in action! 
@doanduyhai 
32 
RF = 3, Write ONE, Read ONE 
B A A 
B A A 
Read ONE: A 
data replication in progress … 
Write ONE: B
Consistency in action! 
data replication in progress … 
@doanduyhai 
33 
RF = 3, Write ONE, Read QUORUM 
B A A 
Write ONE: B 
Read QUORUM: A 
B A A
Consistency in action! 
data replication in progress … 
@doanduyhai 
34 
RF = 3, Write ONE, Read ALL 
B A A 
Read ALL: B 
B A A 
Write ONE: B
Consistency in action! 
data replication in progress … 
@doanduyhai 
35 
RF = 3, Write QUORUM, Read ONE 
B B A 
Write QUORUM: B 
Read ONE: A 
B B A
Consistency in action! 
data replication in progress … 
@doanduyhai 
36 
RF = 3, Write QUORUM, Read QUORUM 
B B A 
Read QUORUM: B 
B B A 
Write QUORUM: B
@doanduyhai 
Consistency level! 
37 
ONE 
Fast, may not read latest written value
@doanduyhai 
Consistency level! 
38 
QUORUM 
Strict majority w.r.t. Replication Factor 
Good balance
@doanduyhai 
Consistency level! 
39 
ALL 
Paranoid 
Slow, no high availability
Consistency summary! 
ONERead + ONEWrite 
☞ available for read/write even (N-1) replicas down 
QUORUMRead + QUORUMWrite 
☞ available for read/write even 1+ replica down 
@doanduyhai 40
! " 
! 
Q & R
Read & Write Path!
Cassandra Write Path! 
@doanduyhai 
43 
Commit log1 
. . . 
1 
Commit log2 
Commit logn 
Memory
Cassandra Write Path! 
@doanduyhai 
44 
Memory 
MemTable 
Table1 
Commit log1 
. . . 
1 
Commit log2 
Commit logn 
MemTable 
Table2 
MemTable 
TableN 
2 
. . .
Cassandra Write Path! 
Table2 Table3 
SStable2 SStable3 3 
@doanduyhai 
45 
Commit log1 
Commit log2 
Commit logn 
Table1 
SStable1 
Memory 
. . .
Cassandra Write Path! 
MemTable . . . Memory 
Table1 
@doanduyhai 
46 
Commit log1 
Commit log2 
Commit logn 
Table1 
SStable1 
Table2 Table3 
SStable2 SStable3 
MemTable 
Table2 
MemTable 
TableN 
. . .
Cassandra Write Path! 
SStable3 . . . 
@doanduyhai 
47 
Commit log1 
Commit log2 
Commit logn 
Table1 
SStable1 
Memory 
Table2 Table3 
SStable2 SStable3 
SStable1 
SStable2
Cassandra Read Path! 
@doanduyhai 
48 
1 Row Cache 
Off Heap 
JVM Heap 
SELECT .. FROM … WHERE #partition =…; 
SStable1 SStable2 SStable3
Cassandra Read Path! 
Bloom Filter 
Bloom Filter 
? ? ? 
@doanduyhai 
49 
1 Row Cache 
Off Heap 
JVM Heap 
SELECT .. FROM … WHERE #partition =…; 
2 Bloom Filter 
SStable1 SStable2 SStable3
Bloom filters in action! 
Write #partition = bar 
h3 
@doanduyhai 
50 
#partition = foo 
h1 h2 
1 0 0 1* 0 0 1 0 1 1 
Read #partition = qux
Partition Key Cache 
Cassandra Read Path! 
2 Bloom Filter 
Bloom Filter 
Bloom Filter 
× ✓ × 
@doanduyhai 
51 
1 Row Cache 
Off Heap 
JVM Heap 
SELECT .. FROM … WHERE #partition =…; 
3 
SStable1 SStable2 SStable3
Partition Key Cache! 
Partition Key Cache SStable 
@doanduyhai 
52 
#partition001 
data 
data 
…. 
#partition002 
#partition350 
data 
data 
… 
#Partition Offset … 
#partition001 0x0 … 
#partition002 0x153 … 
#partition350 0x5464321 …
Cassandra Read Path! 
4 Key Index Sample 
2 Bloom Filter 
Bloom Filter 
@doanduyhai 
53 
1 Row Cache 
Off Heap 
JVM Heap 
SELECT .. FROM … WHERE #partition =…; 
3 Partition Key Cache 
SStable2 
Bloom Filter
@doanduyhai 
Key Index Sample! 
54 
SStable 
#partition001 
data 
data 
…. 
#partition128 
#partition350 
data 
data 
… 
Key Index Sample 
Sample Offset 
#partition001 0x0 
#partition128 0x4500 
#partition256 0x851513 
#partition512 0x5464321
Cassandra Read Path! 
4 Key Index Sample 
2 Bloom Filter 
Bloom Filter 
@doanduyhai 
55 
1 Row Cache 
Off Heap 
JVM Heap 
SELECT .. FROM … WHERE #partition =…; 
3 Partition Key Cache 
5 Compression Offset 
SStable2 
Bloom Filter
Compression Offset! 
@doanduyhai 
56 
Compression Offset 
Normal offset Compressed offset 
0x0 0x0 
0x4500 0x100 
0x851513 0x1353 
0x5464321 0x21245
Cassandra Read Path! 
4 Key Index Sample 
2 Bloom Filter 
Bloom Filter 
@doanduyhai 
57 
1 Row Cache 
Off Heap 
JVM Heap 
SELECT .. FROM … WHERE #partition =…; 
3 Partition Key Cache 
5 Compression Offset 
6 SStable2 
7 MemTable 
Σ 
Bloom Filter
Data model! 
Last Write Win! 
CQL basics! 
From SQL to CQL!
Last Write Win (LWW)! 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
@doanduyhai 
59 
jdoe 
age 
name 
33 John DOE 
#partition
Last Write Win (LWW)! 
@doanduyhai 
jdoe 
age (t1) name (t1) 
33 John DOE 
60 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
auto-generated timestamp (μs) 
.
Last Write Win (LWW)! 
SSTable1 SSTable2 
@doanduyhai 
61 
UPDATE users SET age = 34 WHERE login = jdoe; 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
SSTable1 SSTable2 SSTable3 
@doanduyhai 
62 
DELETE age FROM users WHERE login = jdoe; 
tombstone 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
@doanduyhai 
63 
SELECT age FROM users WHERE login = jdoe; 
? ? ? 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
@doanduyhai 
64 
SELECT age FROM users WHERE login = jdoe; 
✕ ✕ ✓ 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
@doanduyhai 
Compaction! 
65 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34 
New SSTable 
jdoe 
age (t3) name (t1) 
ý John DOE
@doanduyhai 
CRUD operations! 
66 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
UPDATE users SET age = 34 WHERE login = jdoe; 
DELETE age FROM users WHERE login = jdoe; 
SELECT age FROM users WHERE login = jdoe;
@doanduyhai 
DDL Statements! 
67 
Keyspace (tables container) 
• CREATE/ALTER/DROP 
Table 
• CREATE/ALTER/DROP 
Type (custom type) 
• CREATE/ALTER/DROP
User management & permissions! 
@doanduyhai 
68 
User 
• CREATE/ALTER/DROP 
Permissions 
• GRANT <xxx> PERMISSION ON <resource> TO <user_name> 
• REVOKE <xxx> PERMISSION ON <resource> FROM <user_name>
@doanduyhai 
Simple Table! 
69 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
… 
PRIMARY KEY(login)); 
partition key (#partition)
@doanduyhai 
Clustered table! 
70 
CREATE TABLE mailbox ( 
login text, 
message_id timeuuid, 
interlocutor text, 
message text, 
PRIMARY KEY((login), message_id)); 
partition key clustering column unicity
@doanduyhai 
Queries! 
71 
Get message by user and message_id (date) 
SELECT * FROM mailbox WHERE login = jdoe 
and message_id = ‘2014-09-25 16:00:00’; 
Get message by user and date interval 
SELECT * FROM mailbox WHERE login = jdoe 
and message_id <= ‘2014-09-25 16:00:00’ 
and message_id >= ‘2014-09-20 16:00:00’;
@doanduyhai 
Queries! 
72 
Get message by message_id only (#partition not provided) 
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’; 
Get message by date interval (#partition not provided) 
SELECT * FROM mailbox WHERE 
and message_id <= ‘2014-09-25 16:00:00’ 
and message_id >= ‘2014-09-20 16:00:00’;
Get message by user range (range query on #partition) 
Get message by user pattern (non exact match on #partition) 
@doanduyhai 
Queries! 
73 
SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe; 
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
WHERE clause restrictions! 
@doanduyhai 
74 
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition 
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed 
• ☞ full cluster scan 
On clustering columns, only range queries (<, ≤, >, ≥) and exact match 
WHERE clause only possible on columns defined in PRIMARY KEY
Collections & maps! 
@doanduyhai 
75 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
friends set<text>, 
hobbies list<text>, 
languages map<int, text>, 
… 
PRIMARY KEY(login));
Collections & maps! 
@doanduyhai 
76 
All elements loaded at each access 
• ☞ low cardinality only (≤ 1000 elements) 
Should not be used use for 1-N, N-N relations
User Defined Type (UDT)! 
@doanduyhai 
77 
Instead of 
CREATE TABLE users ( 
login text, 
… 
street_number int, 
street_name text, 
postcode int, 
country text, 
… 
PRIMARY KEY(login));
User Defined Type (UDT)! 
@doanduyhai 
78 
CREATE TYPE address ( 
street_number int, 
street_name text, 
postcode int, 
country text); 
CREATE TABLE users ( 
login text, 
… 
location frozen <address>, 
… 
PRIMARY KEY(login));
@doanduyhai 
UDT in action! 
79 
INSERT INTO users(login,name, location) VALUES ( 
‘jdoe’, 
’John DOE’, 
{ 
‘street_number’: 124, 
‘street_name’: ‘Congress Avenue’, 
‘postcode’: 95054, 
‘country’: ‘USA’ 
});
@doanduyhai 
From SQL to CQL! 
80 
Remember…
@doanduyhai 
From SQL to CQL! 
81 
Remember… 
CQL is not SQL
@doanduyhai 
From SQL to CQL! 
82 
Remember… 
there is no join 
(do you want to scale ?)
@doanduyhai 
From SQL to CQL! 
83 
Remember… 
there is no integrity constraint 
(do you want to read-before-write ?)
@doanduyhai 
From SQL to CQL! 
84 
1-1, N-1 : normalized 
User 
1 
n 
Comment 
CREATE TABLE comments ( 
article_id uuid, 
comment_id timeuuid, 
author_id uuid, // typical join id 
content text, 
PRIMARY KEY((article_id), comment_id));
@doanduyhai 
From SQL to CQL! 
85 
1-1, N-1 : de-normalized 
User 
1 
n 
Comment 
CREATE TABLE comments ( 
article_id uuid, 
comment_id timeuuid, 
author person, // person is UDT 
content text, 
PRIMARY KEY((article_id), comment_id));
@doanduyhai 
From SQL to CQL! 
86 
N-1, N-N : normalized 
n 
User 
n 
CREATE TABLE friends( 
login text, 
friend_login text, // typical join id 
PRIMARY KEY((login), friend_login));
@doanduyhai 
From SQL to CQL! 
87 
N-1, N-N : de-normalized 
n 
User 
n 
CREATE TABLE friends( 
login text, 
friend_login text, 
friend person, // person is UDT 
PRIMARY KEY((login), friend_login));
Data modeling best practices! 
@doanduyhai 
88 
Start by queries 
• identify core functional read paths 
• 1 read scenario ≈ 1 SELECT
Data modeling best practices! 
@doanduyhai 
89 
Start by queries 
• identify core functional read paths 
• 1 read scenario ≈ 1 SELECT 
Denormalize 
• wisely, only duplicate necessary & immutable data 
• functional/technical trade-off
Data modeling best practices! 
@doanduyhai 
90 
Person UDT 
- firstname/lastname 
- date of birth 
- sexe 
- mood 
- location
Data modeling best practices! 
@doanduyhai 
91 
John DOE, male 
birthdate: 21/02/1981 
subscribed since 03/06/2011 
☉ San Mateo, CA 
’’Impossible is not John DOE’’ 
Full detail read from 
User table on click
! " 
! 
Q & R
Failure Handling!
Failure Handling! 
Failure occurs when 
• node hardware failure (disk, …) 
• network issues 
• heavy load, node flapping 
• JVM long GCs 
@doanduyhai
@doanduyhai 
Hinted Handoff! 
When 1 replica down 
• coordinator stores Hints 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
× 
Hint
Hinted Handoff! 
When replica up 
• coordinator forward stored Hints 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
Hints
Consistent read! 
Read at QUORUM or more 
• read from one fastest node 
• AND request digest from other 
@doanduyhai 
replicas to reach QUORUM 
• return most up-to-date data to client 
• repair if digest mismatch n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read Repair! 
Read at CL < QUORUM 
• read from one least-loaded node 
• every x reads, request digests from 
@doanduyhai 
other replicas 
• compare digest & repair 
asynchronously 
• read_repair_chance (10%) 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Manual repair! 
Should be part of cluster operations 
• scheduled 
• I/O intensive 
@doanduyhai 
Use incremental repair of 2.1
! " 
! 
Q & R
Training Day | December 3rd 
Beginner Track 
• Introduction to Cassandra 
• Introduction to Spark, Shark, Scala and 
Cassandra 
Advanced Track 
• Data Modeling 
• Performance Tuning 
Conference Day | December 4th 
Cassandra Summit Europe 2014 will be the single 
largest gathering of Cassandra users in Europe. 
Learn how the world's most successful companies are 
transforming their businesses and growing faster than 
ever using Apache Cassandra. 
http://bit.ly/cassandrasummit2014 
Compan@yd Coaonndfiudyehnatiia l 101
Thank You 
@doanduyhai 
duy_hai.doan@datastax.com 
https://academy.datastax.com/

Contenu connexe

Tendances

Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGDuyhai Doan
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016Duyhai Doan
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceDuyhai Doan
 
Базы данных. HBase
Базы данных. HBaseБазы данных. HBase
Базы данных. HBaseVadim Tsesko
 
Базы данных. HDFS
Базы данных. HDFSБазы данных. HDFS
Базы данных. HDFSVadim Tsesko
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016Duyhai Doan
 
Cassandra London - C* Spark Connector
Cassandra London - C* Spark ConnectorCassandra London - C* Spark Connector
Cassandra London - C* Spark ConnectorChristopher Batey
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonJoe Stein
 
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage systemAthens Big Data
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax
 
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query executionAthens Big Data
 
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-..."Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...hamidsamadi
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backVictor_Cr
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...Amazon Web Services
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs HadoopFujio Turner
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekingeProf. Wim Van Criekinge
 
Embedded R Execution using SQL
Embedded R Execution using SQLEmbedded R Execution using SQL
Embedded R Execution using SQLBrendan Tierney
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Lucidworks
 

Tendances (20)

Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUG
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
Базы данных. HBase
Базы данных. HBaseБазы данных. HBase
Базы данных. HBase
 
Базы данных. HDFS
Базы данных. HDFSБазы данных. HDFS
Базы данных. HDFS
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016
 
Cassandra London - C* Spark Connector
Cassandra London - C* Spark ConnectorCassandra London - C* Spark Connector
Cassandra London - C* Spark Connector
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
 
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
 
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
 
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-..."Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes back
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs Hadoop
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
 
Embedded R Execution using SQL
Embedded R Execution using SQLEmbedded R Execution using SQL
Embedded R Execution using SQL
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
 

En vedette

Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsDuyhai Doan
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraDataStax Academy
 
Cassandra rapid prototyping with achilles
Cassandra rapid prototyping with achillesCassandra rapid prototyping with achilles
Cassandra rapid prototyping with achillesDuyhai Doan
 
Cassandra java libraries
Cassandra java librariesCassandra java libraries
Cassandra java librariesDuyhai Doan
 
Apache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraApache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
 
Achilles presentation
Achilles presentationAchilles presentation
Achilles presentationDuyhai Doan
 
Cassandra Drivers and Tools
Cassandra Drivers and ToolsCassandra Drivers and Tools
Cassandra Drivers and ToolsDuyhai Doan
 
Effective cassandra development with achilles
Effective cassandra development with achillesEffective cassandra development with achilles
Effective cassandra development with achillesDuyhai Doan
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisDuyhai Doan
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithmsDuyhai Doan
 
Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and dontsDuyhai Doan
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data modelDuyhai Doan
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...DataStax
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsgrro
 
Cassandra techniques de modelisation avancee
Cassandra techniques de modelisation avanceeCassandra techniques de modelisation avancee
Cassandra techniques de modelisation avanceeDuyhai Doan
 

En vedette (20)

Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 
Cassandra rapid prototyping with achilles
Cassandra rapid prototyping with achillesCassandra rapid prototyping with achilles
Cassandra rapid prototyping with achilles
 
Cassandra java libraries
Cassandra java librariesCassandra java libraries
Cassandra java libraries
 
Apache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraApache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with Cassandra
 
Achilles presentation
Achilles presentationAchilles presentation
Achilles presentation
 
Cassandra Drivers and Tools
Cassandra Drivers and ToolsCassandra Drivers and Tools
Cassandra Drivers and Tools
 
Effective cassandra development with achilles
Effective cassandra development with achillesEffective cassandra development with achilles
Effective cassandra development with achilles
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS Paris
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and donts
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Cassandra techniques de modelisation avancee
Cassandra techniques de modelisation avanceeCassandra techniques de modelisation avancee
Cassandra techniques de modelisation avancee
 

Similaire à Cassandra introduction apache con 2014 budapest

DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...NoSQLmatters
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and librariesDuyhai Doan
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
All Your Base
All Your BaseAll Your Base
All Your BaseAcunu
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysDuyhai Doan
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleChristophe Grand
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Pavlo Baron
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Viswanath J
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraRobbie Strickland
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 

Similaire à Cassandra introduction apache con 2014 budapest (20)

DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and libraries
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
All Your Base
All Your BaseAll Your Base
All Your Base
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev days
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Hotsos 2012
Hotsos 2012Hotsos 2012
Hotsos 2012
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 

Plus de Duyhai Doan

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Duyhai Doan
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandraDuyhai Doan
 
Datastax enterprise presentation
Datastax enterprise presentationDatastax enterprise presentation
Datastax enterprise presentationDuyhai Doan
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandraDuyhai Doan
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDuyhai Doan
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronDuyhai Doan
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search rideDuyhai Doan
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016Duyhai Doan
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Duyhai Doan
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Duyhai Doan
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016Duyhai Doan
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemDuyhai Doan
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsDuyhai Doan
 
Data stax academy
Data stax academyData stax academy
Data stax academyDuyhai Doan
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemDuyhai Doan
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDuyhai Doan
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Duyhai Doan
 

Plus de Duyhai Doan (17)

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandra
 
Datastax enterprise presentation
Datastax enterprise presentationDatastax enterprise presentation
Datastax enterprise presentation
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search ride
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized Views
 
Data stax academy
Data stax academyData stax academy
Data stax academy
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystem
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeCon
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015
 

Dernier

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 

Dernier (20)

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 

Cassandra introduction apache con 2014 budapest

  • 1. Introduction to Cassandra DuyHai DOAN, Technical Advocate
  • 2. @doanduyhai About me! 2 Duy Hai DOAN Cassandra technical advocate • talks, meetups, confs • open-source devs (Achilles, …) • OSS technical point of contact in Europe ☞ duy_hai.doan@datastax.com • production troubleshooting
  • 3. @doanduyhai 3 Datastax! • Founded in April 2010 • We drive Apache Cassandra™ • Home to Cassandra chair & most committers (≈80%) • Headquarter in San Francisco Bay area • EU headquarter in London, offices in France and Germany • Datastax Enterprise = OSS Cassandra + features
  • 4. @doanduyhai Agenda! 4 • Architecture • Replication • Consistency • Read/write path • Data Model • Failure handling
  • 6. @doanduyhai Cassandra history! 6 NoSQL database • created at Facebook • open-sourced since 2008 • current version = 2.1 • column-oriented ☞ distributed table
  • 7. Cassandra 5 key facts! @doanduyhai 7 Linear scalability Small & « huge » scale • 3 à1k+ nodes cluster • 3Gb à Pb+
  • 8. Cassandra 5 key facts! @doanduyhai 8 Continuous availability (≈100% up-time) • resilient architecture (Dynamo) • rolling upgrades • data backward compatible n/n+1 versions
  • 9. Cassandra 5 key facts! @doanduyhai 9 Multi-data centers • out-of-the-box (config only) • AWS conf for multi-region DCs • GCE/CloudStack support • resilience, work-load segregation • virtual data-centers
  • 10. Cassandra 5 key facts! @doanduyhai 10 Operational simplicity • 1 node = 1 process + 1 config file • deployment automation • OpsCenter for monitoring
  • 11. Cassandra 5 key facts! @doanduyhai 11 Analytics combo • Cassandra + Spark = awesome ! • realtime streaming
  • 13. Cassandra architecture! @doanduyhai 13 API (CQL & RPC) CLUSTER (DYNAMODB) DATA STORE (BIG TABLES) DISKS Node1 Client request API (CQL & RPC) CLUSTER (DYNAMODB) DATA STORE (BIG TABLES) DISKS Node2
  • 14. @doanduyhai Data distribution! 14 Random: hash of #partition → token = hash(#p) Hash: ]-X, X] X = huge number (264/2) n1 n2 n3 n4 n5 n6 n7 n8
  • 15. @doanduyhai Token Ranges! 15 A: ]0, X/8] B: ] X/8, 2X/8] C: ] 2X/8, 3X/8] D: ] 3X/8, 4X/8] E: ] 4X/8, 5X/8] F: ] 5X/8, 6X/8] G: ] 6X/8, 7X/8] H: ] 7X/8, X] n1 n2 n3 n4 n5 n6 n7 n8 A B C D E F G H
  • 16. 8 nodes 10 nodes @doanduyhai Linear scalability! 16 n1 n2 n3 n4 n5 n6 n7 n8 n1 n2 n3 n4 n5 n6 n7 n9 n8 n10
  • 17. ! " ! Q & R
  • 19. Failure tolerance by replication! @doanduyhai 19 Replication Factor (RF) = 3 n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 {B, A, H} {C, B, A} {D, C, B} A B C D E F G H
  • 20. @doanduyhai Coordinator node! Incoming requests (read/write) Coordinator node handles the request n1 Every node can be coordinator àmasterless n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator Client request
  • 21. @doanduyhai Consistency! 21 Tunable at runtime • ONE • QUORUM (strict majority w.r.t. RF) • ALL Apply both to read & write
  • 22. Write consistency! Write ONE • write request to all replicas in // @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 23. Write consistency! Write ONE • write request to all replicas in // • wait for ONE ack before returning to @doanduyhai client n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs
  • 24. Write consistency! Write ONE • write request to all replicas in // • wait for ONE ack before returning to @doanduyhai client • other acks later, asynchronously n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs 10 μs 120 μs
  • 25. Write consistency! Write QUORUM • write request to all replicas in // • wait for QUORUM acks before @doanduyhai returning to client • other acks later, asynchronously n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs 10 μs 120 μs
  • 26. Read consistency! Read ONE • read from one node among all replicas @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 27. Read consistency! Read ONE • read from one node among all replicas • contact the fastest node (stats) @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 28. @doanduyhai Read consistency! Read QUORUM • read from one fastest node n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 29. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other @doanduyhai replicas to reach QUORUM n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 30. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other @doanduyhai replicas to reach QUORUM • return most up-to-date data to client n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 31. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other @doanduyhai replicas to reach QUORUM • return most up-to-date data to client • repair if digest mismatch n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 32. Consistency in action! @doanduyhai 32 RF = 3, Write ONE, Read ONE B A A B A A Read ONE: A data replication in progress … Write ONE: B
  • 33. Consistency in action! data replication in progress … @doanduyhai 33 RF = 3, Write ONE, Read QUORUM B A A Write ONE: B Read QUORUM: A B A A
  • 34. Consistency in action! data replication in progress … @doanduyhai 34 RF = 3, Write ONE, Read ALL B A A Read ALL: B B A A Write ONE: B
  • 35. Consistency in action! data replication in progress … @doanduyhai 35 RF = 3, Write QUORUM, Read ONE B B A Write QUORUM: B Read ONE: A B B A
  • 36. Consistency in action! data replication in progress … @doanduyhai 36 RF = 3, Write QUORUM, Read QUORUM B B A Read QUORUM: B B B A Write QUORUM: B
  • 37. @doanduyhai Consistency level! 37 ONE Fast, may not read latest written value
  • 38. @doanduyhai Consistency level! 38 QUORUM Strict majority w.r.t. Replication Factor Good balance
  • 39. @doanduyhai Consistency level! 39 ALL Paranoid Slow, no high availability
  • 40. Consistency summary! ONERead + ONEWrite ☞ available for read/write even (N-1) replicas down QUORUMRead + QUORUMWrite ☞ available for read/write even 1+ replica down @doanduyhai 40
  • 41. ! " ! Q & R
  • 42. Read & Write Path!
  • 43. Cassandra Write Path! @doanduyhai 43 Commit log1 . . . 1 Commit log2 Commit logn Memory
  • 44. Cassandra Write Path! @doanduyhai 44 Memory MemTable Table1 Commit log1 . . . 1 Commit log2 Commit logn MemTable Table2 MemTable TableN 2 . . .
  • 45. Cassandra Write Path! Table2 Table3 SStable2 SStable3 3 @doanduyhai 45 Commit log1 Commit log2 Commit logn Table1 SStable1 Memory . . .
  • 46. Cassandra Write Path! MemTable . . . Memory Table1 @doanduyhai 46 Commit log1 Commit log2 Commit logn Table1 SStable1 Table2 Table3 SStable2 SStable3 MemTable Table2 MemTable TableN . . .
  • 47. Cassandra Write Path! SStable3 . . . @doanduyhai 47 Commit log1 Commit log2 Commit logn Table1 SStable1 Memory Table2 Table3 SStable2 SStable3 SStable1 SStable2
  • 48. Cassandra Read Path! @doanduyhai 48 1 Row Cache Off Heap JVM Heap SELECT .. FROM … WHERE #partition =…; SStable1 SStable2 SStable3
  • 49. Cassandra Read Path! Bloom Filter Bloom Filter ? ? ? @doanduyhai 49 1 Row Cache Off Heap JVM Heap SELECT .. FROM … WHERE #partition =…; 2 Bloom Filter SStable1 SStable2 SStable3
  • 50. Bloom filters in action! Write #partition = bar h3 @doanduyhai 50 #partition = foo h1 h2 1 0 0 1* 0 0 1 0 1 1 Read #partition = qux
  • 51. Partition Key Cache Cassandra Read Path! 2 Bloom Filter Bloom Filter Bloom Filter × ✓ × @doanduyhai 51 1 Row Cache Off Heap JVM Heap SELECT .. FROM … WHERE #partition =…; 3 SStable1 SStable2 SStable3
  • 52. Partition Key Cache! Partition Key Cache SStable @doanduyhai 52 #partition001 data data …. #partition002 #partition350 data data … #Partition Offset … #partition001 0x0 … #partition002 0x153 … #partition350 0x5464321 …
  • 53. Cassandra Read Path! 4 Key Index Sample 2 Bloom Filter Bloom Filter @doanduyhai 53 1 Row Cache Off Heap JVM Heap SELECT .. FROM … WHERE #partition =…; 3 Partition Key Cache SStable2 Bloom Filter
  • 54. @doanduyhai Key Index Sample! 54 SStable #partition001 data data …. #partition128 #partition350 data data … Key Index Sample Sample Offset #partition001 0x0 #partition128 0x4500 #partition256 0x851513 #partition512 0x5464321
  • 55. Cassandra Read Path! 4 Key Index Sample 2 Bloom Filter Bloom Filter @doanduyhai 55 1 Row Cache Off Heap JVM Heap SELECT .. FROM … WHERE #partition =…; 3 Partition Key Cache 5 Compression Offset SStable2 Bloom Filter
  • 56. Compression Offset! @doanduyhai 56 Compression Offset Normal offset Compressed offset 0x0 0x0 0x4500 0x100 0x851513 0x1353 0x5464321 0x21245
  • 57. Cassandra Read Path! 4 Key Index Sample 2 Bloom Filter Bloom Filter @doanduyhai 57 1 Row Cache Off Heap JVM Heap SELECT .. FROM … WHERE #partition =…; 3 Partition Key Cache 5 Compression Offset 6 SStable2 7 MemTable Σ Bloom Filter
  • 58. Data model! Last Write Win! CQL basics! From SQL to CQL!
  • 59. Last Write Win (LWW)! INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); @doanduyhai 59 jdoe age name 33 John DOE #partition
  • 60. Last Write Win (LWW)! @doanduyhai jdoe age (t1) name (t1) 33 John DOE 60 INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); auto-generated timestamp (μs) .
  • 61. Last Write Win (LWW)! SSTable1 SSTable2 @doanduyhai 61 UPDATE users SET age = 34 WHERE login = jdoe; jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 62. Last Write Win (LWW)! SSTable1 SSTable2 SSTable3 @doanduyhai 62 DELETE age FROM users WHERE login = jdoe; tombstone jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 63. Last Write Win (LWW)! @doanduyhai 63 SELECT age FROM users WHERE login = jdoe; ? ? ? SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 64. Last Write Win (LWW)! @doanduyhai 64 SELECT age FROM users WHERE login = jdoe; ✕ ✕ ✓ SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 65. @doanduyhai Compaction! 65 SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 New SSTable jdoe age (t3) name (t1) ý John DOE
  • 66. @doanduyhai CRUD operations! 66 INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); UPDATE users SET age = 34 WHERE login = jdoe; DELETE age FROM users WHERE login = jdoe; SELECT age FROM users WHERE login = jdoe;
  • 67. @doanduyhai DDL Statements! 67 Keyspace (tables container) • CREATE/ALTER/DROP Table • CREATE/ALTER/DROP Type (custom type) • CREATE/ALTER/DROP
  • 68. User management & permissions! @doanduyhai 68 User • CREATE/ALTER/DROP Permissions • GRANT <xxx> PERMISSION ON <resource> TO <user_name> • REVOKE <xxx> PERMISSION ON <resource> FROM <user_name>
  • 69. @doanduyhai Simple Table! 69 CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login)); partition key (#partition)
  • 70. @doanduyhai Clustered table! 70 CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id)); partition key clustering column unicity
  • 71. @doanduyhai Queries! 71 Get message by user and message_id (date) SELECT * FROM mailbox WHERE login = jdoe and message_id = ‘2014-09-25 16:00:00’; Get message by user and date interval SELECT * FROM mailbox WHERE login = jdoe and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
  • 72. @doanduyhai Queries! 72 Get message by message_id only (#partition not provided) SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’; Get message by date interval (#partition not provided) SELECT * FROM mailbox WHERE and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
  • 73. Get message by user range (range query on #partition) Get message by user pattern (non exact match on #partition) @doanduyhai Queries! 73 SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe; SELECT * FROM mailbox WHERE login like ‘%doe%‘;
  • 74. WHERE clause restrictions! @doanduyhai 74 All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed • ☞ full cluster scan On clustering columns, only range queries (<, ≤, >, ≥) and exact match WHERE clause only possible on columns defined in PRIMARY KEY
  • 75. Collections & maps! @doanduyhai 75 CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY(login));
  • 76. Collections & maps! @doanduyhai 76 All elements loaded at each access • ☞ low cardinality only (≤ 1000 elements) Should not be used use for 1-N, N-N relations
  • 77. User Defined Type (UDT)! @doanduyhai 77 Instead of CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, … PRIMARY KEY(login));
  • 78. User Defined Type (UDT)! @doanduyhai 78 CREATE TYPE address ( street_number int, street_name text, postcode int, country text); CREATE TABLE users ( login text, … location frozen <address>, … PRIMARY KEY(login));
  • 79. @doanduyhai UDT in action! 79 INSERT INTO users(login,name, location) VALUES ( ‘jdoe’, ’John DOE’, { ‘street_number’: 124, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ });
  • 80. @doanduyhai From SQL to CQL! 80 Remember…
  • 81. @doanduyhai From SQL to CQL! 81 Remember… CQL is not SQL
  • 82. @doanduyhai From SQL to CQL! 82 Remember… there is no join (do you want to scale ?)
  • 83. @doanduyhai From SQL to CQL! 83 Remember… there is no integrity constraint (do you want to read-before-write ?)
  • 84. @doanduyhai From SQL to CQL! 84 1-1, N-1 : normalized User 1 n Comment CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author_id uuid, // typical join id content text, PRIMARY KEY((article_id), comment_id));
  • 85. @doanduyhai From SQL to CQL! 85 1-1, N-1 : de-normalized User 1 n Comment CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author person, // person is UDT content text, PRIMARY KEY((article_id), comment_id));
  • 86. @doanduyhai From SQL to CQL! 86 N-1, N-N : normalized n User n CREATE TABLE friends( login text, friend_login text, // typical join id PRIMARY KEY((login), friend_login));
  • 87. @doanduyhai From SQL to CQL! 87 N-1, N-N : de-normalized n User n CREATE TABLE friends( login text, friend_login text, friend person, // person is UDT PRIMARY KEY((login), friend_login));
  • 88. Data modeling best practices! @doanduyhai 88 Start by queries • identify core functional read paths • 1 read scenario ≈ 1 SELECT
  • 89. Data modeling best practices! @doanduyhai 89 Start by queries • identify core functional read paths • 1 read scenario ≈ 1 SELECT Denormalize • wisely, only duplicate necessary & immutable data • functional/technical trade-off
  • 90. Data modeling best practices! @doanduyhai 90 Person UDT - firstname/lastname - date of birth - sexe - mood - location
  • 91. Data modeling best practices! @doanduyhai 91 John DOE, male birthdate: 21/02/1981 subscribed since 03/06/2011 ☉ San Mateo, CA ’’Impossible is not John DOE’’ Full detail read from User table on click
  • 92. ! " ! Q & R
  • 94. Failure Handling! Failure occurs when • node hardware failure (disk, …) • network issues • heavy load, node flapping • JVM long GCs @doanduyhai
  • 95. @doanduyhai Hinted Handoff! When 1 replica down • coordinator stores Hints n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator × Hint
  • 96. Hinted Handoff! When replica up • coordinator forward stored Hints @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator Hints
  • 97. Consistent read! Read at QUORUM or more • read from one fastest node • AND request digest from other @doanduyhai replicas to reach QUORUM • return most up-to-date data to client • repair if digest mismatch n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 98. Read Repair! Read at CL < QUORUM • read from one least-loaded node • every x reads, request digests from @doanduyhai other replicas • compare digest & repair asynchronously • read_repair_chance (10%) n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 99. Manual repair! Should be part of cluster operations • scheduled • I/O intensive @doanduyhai Use incremental repair of 2.1
  • 100. ! " ! Q & R
  • 101. Training Day | December 3rd Beginner Track • Introduction to Cassandra • Introduction to Spark, Shark, Scala and Cassandra Advanced Track • Data Modeling • Performance Tuning Conference Day | December 4th Cassandra Summit Europe 2014 will be the single largest gathering of Cassandra users in Europe. Learn how the world's most successful companies are transforming their businesses and growing faster than ever using Apache Cassandra. http://bit.ly/cassandrasummit2014 Compan@yd Coaonndfiudyehnatiia l 101
  • 102. Thank You @doanduyhai duy_hai.doan@datastax.com https://academy.datastax.com/