Cassandra model

Cassandra
CQL & DataModel
http://zqhxuyuan.github.io
2016-9@同盾科技
Cassandra is the cursed ORACLE

http://www.datastax.com/dev/blog/thrift-to-cql3
set
create column family user_profiles
with key_validation_class = UTF8Type
and comparator = UTF8Type
and column_metadata = [
{column_name: first_name, validation_class: UTF8Type},
{column_name: last_name, validation_class: UTF8Type},
{column_name: email, validation_class: UTF8Type},
{column_name: year_of_birth, validation_class: IntegerType}
]
CREATE TABLE user_profiles (
user_id text PRIMARY KEY,
first_name text,
last_name text,
year_of_birth int
) WITH COMPACT STORAGE
Dynamic Column Family

* theoretically up to 2 billion
Timestamp used for conflict resolution (Last Write Wins)
Column→Row→ColumnFamily(Table)
A Row composed of Row Key and multiple Columns
Column Family composed of multiple Rows
(Table)

CQL(Cassandra Query Language)
Single Partition Key
CREATE TABLE books (
title text,
author text,
year int,
PRIMARY KEY (title)
);
INSERT INTO books (title, author, year)
VALUES ('Patriot Games', 'Tom Clancy', 1987);
INSERT INTO books (title, author, year)
VALUES ('Without Remorse', 'Tom Clancy', 1993);
select * from books;
title | author | year
----------------+------------+------
Without Remorse| Tom Clancy | 1993
Patriot Games| Tom Clancy | 1987
RowKey: Without Remorse
=> (name=, value=, timestamp=1393102991499)
=> (name=author, value=Tom Clancy, timestamp=1393102991499)
=> (name=year, value=1993, timestamp=1393102991499)
RowKey: Patriot Games
=> (name=, value=, timestamp=1393102991499100)
=> (name=author, value=Tom Clancy, timestamp=1393102991499100)
=> (name=year, value=1987, timestamp=1393102991499100)
ColumnName
ColumnValue
ColumnName ColumnValue
CREATE TABLE users (
id timeuuid,
lastname varchar,
firstname varchar,
dateOfBirth timestamp,
PRIMARY KEY(id)
);
PRIMARY KEY只有⼀一个PK
和传统的数据库没什么区别
由于PK是timeuuid,每次插⼊入
数据的timeuuid总是不同的，
所以插⼊入都是全新的记录。

CREATE TABLE users2 (
id int,
lastname varchar,
firstname varchar,
dateOfBirth timestamp,
PRIMARY KEY(id)
);
主键不是timeuuid类型，所以
插⼊入时如果指定已经存在的id,
相当于更新操作。右边的四次
操作都是针对id=1的，查询出
都只有⼀一条记录, 即使是insert.
针对同⼀一个Partition Key，不管是INSERT还是UPDTAE操作，操作的都是同⼀一条记录
如果记录不存在，使⽤用INSERT插⼊入（第⼀一次插⼊入）；
如果记录已经存在，使⽤用INSERT或者UPDATE代表的都是更新。
所以如果想要INSERT和UPDATE的双重效果，总是使⽤用INSERT！
和传统数据库更新数据不同（查询数据，不存在则插⼊入，存在则更新），
这⾥里不需要把数据查询出来，⽽而是不管三七⼆二⼗十⼀一，只管INSERT数据。
查询
存在
插⼊入
更新
❌
✔️ INSERT

http://www.sestevez.com/sestevez/CassandraDataModeler/
CREATE TABLE velocity_app (
attribute text,
partner_code text,
app_name text,
type text,
"timestamp" bigint,
event text,
sequence_id text,
PRIMARY KEY ((attribute, partner_code, app_name, type), sequence_id)
)
<ClusterKeyColumn>："NormColumnName"<>表⽰示Column的Value <NormColumnValue>

和users2类似，相同的PartitionKey（这⾥里由多个字段组成，⽽而不是⼀一个字段），插⼊入其他普通字段等价于更新
只要Partition Key和Clustering Key不变，INSERT同⼀一条记录很多次，都只是更新⼀一条记录，以最新的插⼊入为准
Partition Key Clustering Key Ordinary Column

如果PartitionKey相同，但是Clustering Key不同，则CQL会有多⾏行，实际底层是以多个列的形式存储
列名是由<ClusteringKey的值>:普通列的名称组成，velocity_app表有两个普通列:event,timestamp，
所以增加⼀一个Clustering Key，实际上增加了两个列<#sequence_id>:event和<#sequence_id>:timestamp
同理当增加的是不同的Partition Key，实际上增加了三列，注意Partition Key本⾝身空列也算是新的⼀一列
~Think As Column，Not Row~
SAME Partition Key, Different Clustering Key
Different Partition Key
1473427497-1:event | 1473427497-1:timestamp | 1473427497-2:event | 1473427497-2:timestamp
{accountLogin:zqhxuyuan..} | 1473427497 | {accountLogin:zqhxuyuan..} | 1473427497

Compound Keys
CREATE TABLE authors (
name text,
year int,
title text,
isbn text,
publisher text,
PRIMARY KEY (name, year, title)
);
name text,
year int,
title text,
isbn text,
publisher text,
) WITH CLUSTERING ORDER BY (year DESC);
name | year | title | isbn | publisher
------------+------+-----------------+---------------+-----------
Tom Clancy | 1987 | Patriot Games | 0-399-13241-4 | Putnam
Tom Clancy | 1993 | Without Remorse | 0-399-13825-0 | Putnam
RowKey: Tom Clancy
=> (name=1987:Patriot Games:ISBN, value=0-399-13241-4)
=> (name=1987:Patriot Games:publisher, value=Putnam)
=> (name=1993:Without Remorse:ISBN, value=0-399-13825-0)
=> (name=1993:Without Remorse:publisher, value=Putnam)
------------+------+-----------------+---------------+-----------
RowKey: Tom Clancy
=> (name=1993:Without Remorse:ISBN, value=0-399-13825-0)
=> (name=1993:Without Remorse:publisher, value=Putnam)
=> (name=1987:Patriot Games:ISBN, value=0-399-13241-4)
=> (name=1987:Patriot Games:publisher, value=Putnam)
insert into authors(name,year,title,isbn,publisher) values ('Tom Clancy',1987,'Patriot Games','0-399-13241-4','Putnam');
insert into authors(name,year,title,isbn,publisher) values ('Tom Clancy',1993,'Without Remorse','0-399-13825-0','Putnam');
RowKey |1987:Patriot Games:ISBN|1987:Patriot Games:publisher|1993:Without Remorse:ISBN|1993:Without Remorse:publisher
----------+-----------------------+----------------------------+-------------------------+------------------------------
Tom Clancy| 0-399-13241-4 | Putnam | 0-399-13825-0 | Putnam
<year>:<title>:ISBN <year>:<title>:publisher
Clustering Keys
Ordinary Columns
One Row

Composite Partition Keys
name text,
year int,
title text,
isbn text,
publisher text,
PRIMARY KEY ((name, year), title)
);
------------+------+-----------------+---------------+-----------
RowKey: Tom Clancy:1987
=> (name=Patriot Games:isbn, value=0-399-13825-0)
=> (name=Patriot Games:publisher, value=Putnam
---------------------------------------------------
RowKey: Tom Clancy:1993
=> (name=Without Remorse:isbn, value=0-399-13241-4)
=> (name=Without Remorse:publisher, value=Putnam)
RowKey |Patriot Games:ISBN|Patriot Games:publisher
---------------+------------------+------------------------
Tom Clancy:1987| 0-399-13241-4 | Putnam
RowKey |Without Remorse:ISBN|Without Remorse:publisher
---------------+--------------------+-------------------------
Tom Clancy:1993| 0-399-13825-0 | Putnam
RowKey |1987:Patriot Games:ISBN|1987:Patriot Games:publisher|1993:Without Remorse:ISBN|1993:Without Remorse:publisher
----------+-----------------------+----------------------------+-------------------------+------------------------------
Tom Clancy| 0-399-13241-4 | Putnam | 0-399-13825-0 | Putnam
#Row 1
#Row 2
Two Row(based on Partition Key)
name text,
year int,
title text,
isbn text,
publisher text,
);

age role
john 37 dev
age role
eric 38 ceo
name | age | role
-----+-----+-----
john | 37 | dev
eric | 38 | ceo
CREATE TABLE employees (
name text PRIMARY KEY,
age int,
role text
);
CREATE TABLE employees (
company text,
name text,
age int,
role text,
PRIMARY KEY (company,name)
);
company | name | age | role
--------+------+-----+-----
OSC | eric | 38 | ceo
OSC | john | 37 | dev
RKG | anya | 29 | lead
RKG | ben | 27 | dev
RKG | chad | 35 | ops
eric:ag
e
eric:role john:ag
e
john:role
OSC 38 dev 37 dev
anya:age anya:role ben:ag
e
ben:role chad:age chad:role
RKG 29 lead 27 dev 35 ops
CREATE TABLE example (
A text,
B text,
C text,
D text,
E text,
F text,
PRIMARY KEY ((A,B),C,D)
);
A | B | C | D | E | F
--+---+---+---+---+---
a | b | c | d | e | f
a | b | c | g | h | i
a | b | j | k | l | m
a | n | o | p | q | r
s | t | u | v | w | x
c:d:E c:d:F c:g:E c:g:F j:k:E j:k:F
a:b e f h i l m
o:p:E o:p:F
a:n q r
u:v:E u:v:F
s:t w x
More Examples(1:Basic)

SETS
CREATE TABLE mytable(
X text,
Y text,
myset set<int>,
PRIMARY KEY (X,Y)
);
X | Y | myset
---+---+------------
a | b | {1,2}
a | c | {3,4,5}
b:myset:1 b:myset:2 c:myset:3 c:myset:4 c:myset:5
a
X | Y | mylist
---+---+------------
a | b | [1,2]
b:mylist:f7e5450039..8d b:mylist:f7e5450139..8d
a 1 2
LISTS
X text,
Y text,
mylist list<int>,
PRIMARY KEY (X,Y)
);
MAPS
X text,
Y text,
mymap map<text,int>,
PRIMARY KEY (X,Y)
);
X | Y | mymap
---+---+------------
a | b | {m:1,n:2}
a | c |{n:3,p:4,q:5}
b:mymap:m b:mymap:n c:mymap:n c:mymap:p c:mymap:q
a 1 2 3 4 5
More Examples(2:Collection)
key
value
mapName<ClusterKey>
ColumnName
List elements
timeuuidlistName
ColumnName
<ClusterKey>
setName
ColumnName
<ClusterKey>
Set Elements

name | year | title | publisher
------------------+------+-----------------+--------------
Tom Clancy | 1987 | Patriot Games | Putnam
Dean Koontz | 1991 | Cold Fire | Headline
Anne Rice | 1998 | Pandora | Random House
Charles Dickens | 1838 | Oliver Twist | Random House
Secondary Index are the only type of index that Cassandra will manage for you, so the terms “index” and “secondary index”
actually refer to the same mechanism. The purpose of an index is to allow query-by-value functionality(not by primary key).
索引（⼆二级索引）
At the storage layer, a secondary index is simply another
column family,where the key is the value of the indexed column,
and the columns contain the row keys of the indexed table.
⼆二级索引和IN查询类似，因为要查询索引条⺫⽬目的所有Row Keys
IN查询可以确定具体查询哪些节点，但是⼆二级索引查询所有节点
因为⼆二级索引是本地的，不是全局，即使节点没有索引值，也要
去这个节点查询，⽐比如empty index所在节点仍然需要查询。
本地：每个节点建⽴立各⾃自的索引，⽆无法感知集群其他节点的索引
$
Node1
Node2
publisher | names
---------------+----------------------------
Putnam | [Tom Clancy]
Headline | [Dean Koontz]
Random House | [Anne Rice, Charles Dickens]
% &
RowKey: Putnam
=> (name=Tom Clancy, value=)
RowKey: Headline
=> (name=Dean Koontz, value=)
RowKey: Random House
=> (name=Anne Rice, value=)
=> (name=Charles Dickens, value=)
Index Column Row Keys
CREATE INDEX author_publisher ON author (publisher);

Tombstone
Column、Row、Row Range、TTL

Tracing on(distribute cluster)
127.0.0.1 127.0.0.2Client
Coordinator
1.选择副本要写到哪个节点
2.发送消息给选中的副本节点
8.接收副本节点的响应结果消息
9.处理响应结果
3.从协调节点接收消息(数据)
4.在本地节点运⽤用Mutation
5.追加⽇日志，添加到内存
6.准备返回响应结果给协调节点
将响应结果放⼊入队列(发往协调节点)
7.从队列中弹出响应结果,发送给协调节点
1
2
3
4
9
5
7
6
8
10.返回给客户端
0.客户发送请求给协调节点

》insert, flush
[
{"key": "16","columns": [["","",1417814256390], ["col2","26",1417814256390], ["col3","36",1417814256390], ["id","id16",1417814256390]]},
{"key": "1","columns": [["","",1417814185270], ["col2","2",1417814185270], ["col3","3",1417814185270], ["id","id1",1417814185270]]}
]
》delete ROW, delete COLUMN, flush
delete from ts1 WHERE col1 = '1';
delete id from ts1 WHERE col1 = ’11';
delete col2 from ts1 WHERE col1 = '12';
[
{"key": "1","metadata": {"deletionInfo": {"markedForDeleteAt":1417814302304,"localDeletionTime":1417814302}},"columns": []},
{"key": "11","columns": [["id","54822130",1417814320400,"d"]]},
{"key": "12","columns": [["col2","5482220b",1417814539434,"d"]]}
]
》compact
[
{"key": "12","columns": [["","",1417814207910], ["col2","5482220b",1417814539434,"d"], ["col3","32",1417814207910], ["id","id12",1417814207910]]},
{"key": "11","columns": [["","",1417814197094], ["col2","21",1417814197094], ["col3","31",1417814197094], ["id","54822130",1417814320400,"d"]]},
{"key": "1","metadata": {"deletionInfo": {"markedForDeleteAt":1417814302304,"localDeletionTime":1417814302}},"columns": []}
]
》gc_grace_seconds, compact, delete/clean-up Tombstone
[
{"key": "12","columns": [["","",1417814207910], ["col3","32",1417814207910], ["id","id12",1417814207910]]},
{"key": "11","columns": [["","",1417814197094], ["col2","21",1417814197094], ["col3","31",1417814197094]]}
]
column or row deleted before really disappear after gc_grace_seconds
column or row still stay on sstable, but with tombstone(d) marked
delete command will marked as tombstone after flush to sstable
after flush as sstable, you can use sstablejson to check data
row-key=1 deleted
row-key=11, column=id deleted
row -key=12, column=col2 deleted
use sstablejson to see tombstone
http://stackoverflow.com/questions/27776337/what-types-of-tombstones-does-cassandra-support?rq=1
CREATE TABLE ts1 (
col1 text,
col2 text,
col3 text,
id text,
PRIMARY KEY ((col1))
)
e - expired TTL
d - deleted value (tombstone)
t - deleted range of values (range tombstone)

C* INSERT Flow: Coordinator(replicas)→CommitLog→Memtable
How to Check Data…
1.insert data
2.nodetool ﬂush
3.sstabledump Data.db
Demo Time…

DELETE FLOW SAME AS INSERT FLOW…(coord,log,mem)
Single Partition Query: [sstable+memtable→merge]
Row Tombstone,Should Be 3 Tombstone Cells?Why 0??
1.Determing replicas for mutation
2.Appending to commitlog
3.Adding to #table memtable
🔍
关于Collection类型的删除：http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

Delete NonExist Row,
Query Return Empty,
But Produce Tombstone
------------+------+-----------------+---------------+-----------
❌
这条Row Key记录根本就不存在
不过Row Key即使不存在，
Delete也会产⽣生⼀一个Tombstone

Delete Column,
Query Column Return null
Produce Column Tombstone
insert null Column，或Delete Column
都会创建⼀一个Column级别的Tombstone

insert into authors(name,year,title,isbn,publisher) values ('Tom Clancy',1987,'Patriot Games','0-399-13241-4','Putnam');
insert into authors(name,year,title,isbn,publisher) values ('Tom Clancy',1993,'Without Remorse','0-399-13825-0','Putnam');
DELETE FROM authors WHERE name = 'Tom Clancy' AND year = 1987 AND title = 'Patriot Gamess';
SELECT * FROM authors WHERE name = 'Tom Clancy' AND year = 1987 AND title = 'Patriot Games';
DELETE isbn FROM authors WHERE name = 'Tom Clancy' AND year = 1983 AND title = 'Without Remorse';
DELETE FROM authors WHERE name = 'Tom Clancy' AND year = 1983 AND title = 'Without Remorse';
SELECT * FROM authors WHERE name = 'Tom Clancy' AND year = 1983 AND title = 'Without Remorse';
DELETE isbn FROM authors WHERE name = 'Tom Clancy' AND year = 1993 AND title = 'Without Remorse';
DELETE FROM authors WHERE name = 'Tom Clancy' AND year = 1993 AND title = 'Without Remorse';
SELECT * FROM authors WHERE name = 'Tom Clancy' AND year = 1993 AND title = 'Without Remorse';
删除不存在的记录，仍然会产⽣生Tombstone
先Delete Column，再Delete Row，最后还是只剩下Row Tombstone（⼤大覆盖⼩小）
Row Tombstone
sstabledump(tombstone)

在上⾯面的三个Row Tombstone基础上继续做实验…
insert data(1987,P.G.), ﬂush, dump, compact
[Delete]1987:P.G.,tombstone,ts=2016-9-10 10:53:08
[Insert]1987:P.G.,cells:….……,ts=2016-9-10 12:19:33 LastWriteWins(based on timestamp)
Row存活，⽤用liveness_info表⽰示创建的时间撮，通常有cells
如果只有deletion_info，表⽰示Row或Column是Tombstone

> alter table authors with gc_grace_seconds = 0;
nodetool compact(one sstable), sstabledump…
❌
❌
✅
ﬂush会产⽣生新⽂文件
compact会将已有
的所有⽂文件合并也
产⽣生⼀一个新⽂文件并
删除原来所有⽂文件
新⽂文件编号总递增
gc_grace_seconds设置为
0，当发⽣生compaction时，
会对Tombstone进⾏行清理
Where does Cell/Column’s Timestamp in?
liveness_info seems is Row’s timestamp…

>INSERT INTO authors
(name,year,title,isbn,publisher)
VALUES ('Tom Clancy',1993,'Without Remorse',
'0-399-13825-0','Putnam')
USING TTL 300;
>SELECT * FROM authors
WHERE name = 'Tom Clancy'
AND year = 1993
AND title = 'Without Remorse';
→ ﬂush ⽣生成sstable#6
→ dump查看sstable#6
USING TTL(insert or update)
2016-9-11T00:57:57 —— 300s(5minute: 58,59,00,01,02) —— 2016-9-11T01:02:57
CQL落后本地：FLUS TIME: 2016-9-11T08:59:XX
⼋八⼩小时时区差：INSERT TIME: 2016-9-11T00:57:57
创建时间失效时间是否失效
1.插⼊入时使⽤用TTL, 刷新并⽤用dump查看⽂文件, expired=false
2.TTL时间过去后,查询不到记录,不需要刷新,expired=true
3.⼿手动Compaction, TTL创建的Tombstone会被清理掉
insert
%

compact:将sstable#5和sstable#6合并为sstable#7

5 minutes laster, query again…, Empty Row!
sstable#007
&
Partition Index found..
Key Cache hit……………
Read 0 live and
0 tombstone cells

$
其实不需要ﬂush，因为并没有任何DML操作(包括DELETE)
TTL指定时间过去后，会⾃自动将expired更新为true，但是并
不会⽣生成⼀一个新⽂文件！所以即使ﬂush，sstable的时间也不变
⼿手动执⾏行Compaction，sstable#7
会被删除，并⽣生成sstable#8新⽂文件
新⽂文件中把TTL失效的Row删除掉.
记得之前的实验gc_grace_seconds=0...
❌
✅

11:25:46插⼊入数据
11:26:XX刷新⽂文件→sstable#9
sstable#8是上⼀一次实验结果的数据，sstable#9是本次新插⼊入的数据
⼿手动Compaction，会将两个sstable⽂文件合并为⼀一个sstable#10⽂文件
⽂文件的内容是sstable#8和sstable#9的合并，其中sstable#9有带了
TTL失效时间，liveness_info包括创建时间、失效时间，是否失效...
11:26:XY合并两个⽂文件→sstable#10
Deep into TTL

还记得插⼊入时间为2016-9-11 11:25:46
减去8⼩小时区,等于2016-9-11 03:25:46
失效时间为2016-9-11 03:30:46
加上⼋八⼩小时2016-9-11 11:30:46
2016-9-11 11:30:37再次查询，TTL=失效时间-当前时间
[2016-9-11 11:30:46]–[2016-9-11 11:30:37]=9s….
已经ﬂush成sstable⽂文件，读取时从sstable读取
并且有Key Cache缓存来加快key的索引位置….
失效时间—当前查询时间=ttl(剩余存活时间)

1.还没有发⽣生flush操作，
刚插⼊入的数据是在内存中，
所以是Merging memtable
⽽而并不会从sstable⽂文件中读取
2.已经flush成sstable⽂文件，
读取时从sstable读取，并且
有Key Cache缓存加快读取
1.未flush，读取Merging Memtable
2.flush后，读取sstable以及keycache

1.就在失效的那⼀一刻查询到了2个tombstone
2.超过失效时间，就查询不到Tombstone了..

即使TTL过期失效了, 没有新⽂文件⽣生成, ⽂文件的时间撮看起来也
没有发⽣生变化, 但⽤用dump命令会注意Data.db数据⽂文件有变化
TTL过期之前, expired=false; TTL过期之后, expired=true
既然⽂文件内容有变化（expired=false改为true），为什么时间撮不会变化？
?

ﬂush其实不会有任何变化，因为没有新的DML操作
TTL⾃自动过期不属于DML操作，不会⽣生成新⽂文件
⼿手动compact，则会将Tombstone删除，因为TTL
对应的记录已经expired=true了，已经失效的记录
实际上也是⼀一种Tombstone，⽽而gc_grace_seconds
在前⾯面的实验中设置为0，⼀一旦发⽣生compact就会对
Tombstone进⾏行清理

Tombstone(Read) can’t use Bloom Filter
BF⽤用来判断RowKey是否存在于SSTable中，
如果BF判定为True，有可能存在，有可能不存在
如果BF判定为False，则⼀一定不会存在！
但是即使判定为False，有可能存在Tombstone！
所以还是需要读取SSTable⽂文件中的Tombstone！
A query for a key in an sstable that has only tombstones associated
with it will still pass through the bloom ﬁlter, because the system must
reconcile(协调) tombstones with other replicas. Since the bloom ﬁlter
is designed to prevent unnecessary reads for missing data, this means
Cassandra will perform extra reads after data has been deleted.
BloomFilter FalseRead(“row1”) ❌
BloomFilter FalseRead(“row1”) 仍然要读取⽂文件，因为存在Tombstone
tombstone
读取Tombstone时，BloomFilter不能正常使⽤用！
Read(“row1”)
既然BloomFilter没有发挥应有的义务，就不需要BF了！
❎
✅
✅

CREATE TYPE address (
street text,
city text,
zip_code int,
phones set<text>
)
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
addresses map<text, address>
)
SELECT id, name, addresses.city, addresses.phones FROM users;
id | name | addresses.city | addresses.phones
--------------------+----------------+--------------------------
63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
UDT(User Deﬁned Type)

Cassandra model

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Cassandra model

Similaire à Cassandra model (20)

Dernier

Dernier (20)

Cassandra model