SlideShare une entreprise Scribd logo
1  sur  57
Apache Cassandra, part 1 – principles, data model
I. RDBMS Pros and Cons
Pros Good balance between functionality and usability. Powerful tools support. SQL has feature rich syntax Set of widely accepted standards. Consistency
Scalability RDBMS were mainstream for tens years till requirements for scalability were increased dramatically. Complexity of processed data structures was increased dramatically.
Scaling Two ways to achieve scalability: Vertical scaling Horizontal  scaling
CAP Theorem
Cons Cost of distributed transactions No availability support . Two DB with 99.9% have availability 100% - 2 * (100% - DB availability) = 99.8% (43 min. downtime per month). Additional synchronization overhead. As slow as slowest DB node + network latency. 2PC is blocking protocol. It is possible to lock resources forever.
Cons Usage of master - slave replication. Makes write side (master)  performance bottleneck and requires additional CPU/IO resources.  There is no partition tolerance.
Sharding Feature sharding Hash code sharding Lookup table  - Node that contains lookup table is performance bottleneck and single point of failure.
Feature sharding 	DB instances are divided by DB functions.
Hash code sharding 	Data is divided through DB instances by hash code ranges.
Sharding consistency For efficient sharding data should be eventually consistent.
Feature vs. hash code sharding Feature sharding allows to perform consistency tuning on the domain logic granularity. But load may be not well balanced. Hash code sharding allows to perform good load balancing but does not allow consistency on domain logic level.
Cassandra sharding Cassandra uses hash code load balancing Cassandra better fits for reporting than for business logic processing. Cassandra + Hadoop  == OLAP server with high performance and availability.
II. Apache Cassandra. Overview
Cassandra Amazon Dynamo (architecture) DHT Eventual consistency Tunable trade-offs, consistency Google BigTable (data model) ,[object Object]
Column families and columns+
Distributed and decentralized No master/slave nodes (server symmetry) No single point of failure
DHT Distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.
DHT Keyspace Keyspace partitioning Overlay network
Keyspace Abstract keyspace, such as the set of 128 or 160 bit strings.  A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes.
Keyspace partitioning Keyspace distance function δ(k1,k2)  A node with ID ix owns all the keys km for which ix is the closest ID, measured according to δ(km,ix).
Keyspace partitioning Imagine mapping range from 0 to 2128 into a circle so the values wrap around. 
Keyspace partitioning Consider what happens if node C is removed
Keyspace partitioning Consider what happens if node D is added
Overlay network For any key k, each node either has a node ID that owns k or has a link to a node whose node ID is closer to k Greedy algorithm (that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest to k
Elastic scalability Adding/removing new node doesn’t require reconfiguring of Cassandra, changing application queries or restarting system
High availability and fault tolerance Cassandra picks A and P from CAP Eventual consistency
Tunable consistency Replication factor (number of copies of each piece of data) Consistency level (number of replicas to access on every read/write operation)
Quorum consistency level R = N/2 + 1 	W = N/2 + 1 R + W > N
Hybrid orientation Column orientation columns aren’t fixed columns can be sorted columns can be queried for a certain range Row orientation each row is uniquely identifiable by key rows group columns and super columns
Schema-free You don’t have to define columns when you create data model You think of queries you will use and then provide data around them
High performance 50 GB reading and writing ,[object Object],- write 0.12 ms - read : 15 ms ,[object Object],- write : 300 ms - read : 350 ms
III. Data Model
Database Table1 Table2 Relational data model
Cassandra data model Keyspace Column Family Column1 Column2 Column3 RowKey1 Value3 Value2 Value1 Column4 Column1 RowKey2 Value4 Value1
Keyspace Keyspace is close to a relational database Basic attributes: replication factor replica placement strategy column families (tables from relational model) Possible to create several keyspaces per application (for example, if you need different replica placement strategy or replication factor)
Column family Container for collection of rows Column family is close to a table from relational data model Column Family Row Column1 Column2 Column3 RowKey Value3 Value2 Value1
Column family vs. Table Store represents four-dimensional hash map[Keyspace][ColumnFamily][Key][Column] The columns are not strictly defined in column family and you can freely add any column to any row at any time A column family can hold columns or super columns (collection of subcolumns)
Column family vs. Table Column family has an comparator attribute which indicated how columns will be sorted in query results (according to long, byte, UTF8, etc) Each column family is stored in separate file on disk so it’s useful to keep related columns in the same column family
Column Basic unit of data structure Column name: byte[] value: byte[] clock: long
Skinny and wide rows Wide rows – huge number of columns and several rows (are used to store lists of things) Skinny rows – small number of columns and many different rows (close to the relational model)
Disadvantages of wide rows Badly work with RowCash If you have many rows and many columns you end up with larger indexes 	(~ 40GB of data and 10GB index)
Column sorting Column sorting is typically important only with wide model Comparator – is an attribute of column family that specifies how column names will be compared for sort order
Comparator types Cassandra has following predefined types: AsciiType BytesType LexicalUUIDType IntegerType LongType TimeUUIDType UTF8Type
Super column Stores map of subcolumns Super column name: byte[] cols: Map<byte[], Column> ,[object Object]
Five-dimensional hash:[Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]
Super column ,[object Object]
Necessity more then one level depth
Performance issues,[object Object]
Note that There are no joins in Cassandra, so you can join data on a client side create denormalized second column family
IV. Advanced column types
TTL column type TTL column is column value of which expires after given period of time. Useful to store session token.
Counter column In eventual consistent environment old versions of column values are overridden by new one, but counters should be cumulative. Counter columns are intended to support increment/decrement operations in eventual consistent environment without losing any of them.
CounterColumn internals CounterColumn structure: name ……. [ 		(replicaId1, counter1, logical clock1), 		(replicaId2, counter2, logical clock2),            ……………….. 		(replicaId3, counter3, logical clock3) ]
CounterColumn write -  before UPDATE CounterCF SET count_me = count_me + 2  	WHERE key = 'counter1‘ [ 		(A, 10, 2), 		(B, 3, 4),     	(C, 6, 7) ]
CounterColumn write -after A is leader 	[ 		(A, 10 + 2, 2 + 1), 		(B, 3, 4),  	        (C, 6, 7) 	]

Contenu connexe

Tendances

Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with CassandraRyan King
 
Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in CassandraArunit Gupta
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
Mysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilityMysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilitySergey Petrunya
 
The inner workings of Dynamo DB
The inner workings of Dynamo DBThe inner workings of Dynamo DB
The inner workings of Dynamo DBJonathan Lau
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark datastaxjp
 
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleCassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleDataStax Academy
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data ModellingKnoldus Inc.
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache CassandraJacky Chu
 

Tendances (20)

Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
 
Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in Cassandra
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Mysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilityMysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperability
 
The inner workings of Dynamo DB
The inner workings of Dynamo DBThe inner workings of Dynamo DB
The inner workings of Dynamo DB
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleCassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data Modelling
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 

En vedette

CQRS innovations (English version)
CQRS innovations (English version)CQRS innovations (English version)
CQRS innovations (English version)Andrey Lomakin
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryAndrey Lomakin
 
Apache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraApache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
 
Cassandra datamodel
Cassandra datamodelCassandra datamodel
Cassandra datamodellurga
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...DataStax
 
NoSQL with Cassandra
NoSQL with CassandraNoSQL with Cassandra
NoSQL with CassandraGasol Wu
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with CassandraMikalai Alimenkou
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data ModelingMatthew Dennis
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data modelDuyhai Doan
 
Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsDataStax Academy
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureDataStax
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Cassandra model
Cassandra modelCassandra model
Cassandra modelzqhxuyuan
 

En vedette (19)

CQRS innovations (English version)
CQRS innovations (English version)CQRS innovations (English version)
CQRS innovations (English version)
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machinery
 
Apache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraApache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with Cassandra
 
Cassandra datamodel
Cassandra datamodelCassandra datamodel
Cassandra datamodel
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
 
NoSQL with Cassandra
NoSQL with CassandraNoSQL with Cassandra
NoSQL with Cassandra
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with Cassandra
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 
Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide Rows
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Cassandra model
Cassandra modelCassandra model
Cassandra model
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 

Similaire à Apache Cassandra, part 1 – principles, data model

Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQLPankaj Khattar
 
Apache cassandra - future without boundaries (part1)
Apache cassandra - future without boundaries (part1)Apache cassandra - future without boundaries (part1)
Apache cassandra - future without boundaries (part1)Return on Intelligence
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraTarun Garg
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.pptMARasheed3
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandraNavanit Katiyar
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»Olga Lavrentieva
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindingsDmitriy Lyubimov
 

Similaire à Apache Cassandra, part 1 – principles, data model (20)

Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
Apache cassandra - future without boundaries (part1)
Apache cassandra - future without boundaries (part1)Apache cassandra - future without boundaries (part1)
Apache cassandra - future without boundaries (part1)
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.ppt
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 

Dernier

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Dernier (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Apache Cassandra, part 1 – principles, data model

  • 1. Apache Cassandra, part 1 – principles, data model
  • 2. I. RDBMS Pros and Cons
  • 3. Pros Good balance between functionality and usability. Powerful tools support. SQL has feature rich syntax Set of widely accepted standards. Consistency
  • 4. Scalability RDBMS were mainstream for tens years till requirements for scalability were increased dramatically. Complexity of processed data structures was increased dramatically.
  • 5. Scaling Two ways to achieve scalability: Vertical scaling Horizontal scaling
  • 7. Cons Cost of distributed transactions No availability support . Two DB with 99.9% have availability 100% - 2 * (100% - DB availability) = 99.8% (43 min. downtime per month). Additional synchronization overhead. As slow as slowest DB node + network latency. 2PC is blocking protocol. It is possible to lock resources forever.
  • 8. Cons Usage of master - slave replication. Makes write side (master) performance bottleneck and requires additional CPU/IO resources. There is no partition tolerance.
  • 9. Sharding Feature sharding Hash code sharding Lookup table - Node that contains lookup table is performance bottleneck and single point of failure.
  • 10. Feature sharding DB instances are divided by DB functions.
  • 11. Hash code sharding Data is divided through DB instances by hash code ranges.
  • 12. Sharding consistency For efficient sharding data should be eventually consistent.
  • 13. Feature vs. hash code sharding Feature sharding allows to perform consistency tuning on the domain logic granularity. But load may be not well balanced. Hash code sharding allows to perform good load balancing but does not allow consistency on domain logic level.
  • 14. Cassandra sharding Cassandra uses hash code load balancing Cassandra better fits for reporting than for business logic processing. Cassandra + Hadoop == OLAP server with high performance and availability.
  • 16.
  • 18. Distributed and decentralized No master/slave nodes (server symmetry) No single point of failure
  • 19. DHT Distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.
  • 20. DHT Keyspace Keyspace partitioning Overlay network
  • 21. Keyspace Abstract keyspace, such as the set of 128 or 160 bit strings. A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes.
  • 22. Keyspace partitioning Keyspace distance function δ(k1,k2)  A node with ID ix owns all the keys km for which ix is the closest ID, measured according to δ(km,ix).
  • 23. Keyspace partitioning Imagine mapping range from 0 to 2128 into a circle so the values wrap around. 
  • 24. Keyspace partitioning Consider what happens if node C is removed
  • 25. Keyspace partitioning Consider what happens if node D is added
  • 26. Overlay network For any key k, each node either has a node ID that owns k or has a link to a node whose node ID is closer to k Greedy algorithm (that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest to k
  • 27. Elastic scalability Adding/removing new node doesn’t require reconfiguring of Cassandra, changing application queries or restarting system
  • 28. High availability and fault tolerance Cassandra picks A and P from CAP Eventual consistency
  • 29. Tunable consistency Replication factor (number of copies of each piece of data) Consistency level (number of replicas to access on every read/write operation)
  • 30. Quorum consistency level R = N/2 + 1 W = N/2 + 1 R + W > N
  • 31. Hybrid orientation Column orientation columns aren’t fixed columns can be sorted columns can be queried for a certain range Row orientation each row is uniquely identifiable by key rows group columns and super columns
  • 32. Schema-free You don’t have to define columns when you create data model You think of queries you will use and then provide data around them
  • 33.
  • 35. Database Table1 Table2 Relational data model
  • 36. Cassandra data model Keyspace Column Family Column1 Column2 Column3 RowKey1 Value3 Value2 Value1 Column4 Column1 RowKey2 Value4 Value1
  • 37. Keyspace Keyspace is close to a relational database Basic attributes: replication factor replica placement strategy column families (tables from relational model) Possible to create several keyspaces per application (for example, if you need different replica placement strategy or replication factor)
  • 38. Column family Container for collection of rows Column family is close to a table from relational data model Column Family Row Column1 Column2 Column3 RowKey Value3 Value2 Value1
  • 39. Column family vs. Table Store represents four-dimensional hash map[Keyspace][ColumnFamily][Key][Column] The columns are not strictly defined in column family and you can freely add any column to any row at any time A column family can hold columns or super columns (collection of subcolumns)
  • 40. Column family vs. Table Column family has an comparator attribute which indicated how columns will be sorted in query results (according to long, byte, UTF8, etc) Each column family is stored in separate file on disk so it’s useful to keep related columns in the same column family
  • 41. Column Basic unit of data structure Column name: byte[] value: byte[] clock: long
  • 42. Skinny and wide rows Wide rows – huge number of columns and several rows (are used to store lists of things) Skinny rows – small number of columns and many different rows (close to the relational model)
  • 43. Disadvantages of wide rows Badly work with RowCash If you have many rows and many columns you end up with larger indexes (~ 40GB of data and 10GB index)
  • 44. Column sorting Column sorting is typically important only with wide model Comparator – is an attribute of column family that specifies how column names will be compared for sort order
  • 45. Comparator types Cassandra has following predefined types: AsciiType BytesType LexicalUUIDType IntegerType LongType TimeUUIDType UTF8Type
  • 46.
  • 48.
  • 49. Necessity more then one level depth
  • 50.
  • 51. Note that There are no joins in Cassandra, so you can join data on a client side create denormalized second column family
  • 53. TTL column type TTL column is column value of which expires after given period of time. Useful to store session token.
  • 54. Counter column In eventual consistent environment old versions of column values are overridden by new one, but counters should be cumulative. Counter columns are intended to support increment/decrement operations in eventual consistent environment without losing any of them.
  • 55. CounterColumn internals CounterColumn structure: name ……. [ (replicaId1, counter1, logical clock1), (replicaId2, counter2, logical clock2), ……………….. (replicaId3, counter3, logical clock3) ]
  • 56. CounterColumn write - before UPDATE CounterCF SET count_me = count_me + 2 WHERE key = 'counter1‘ [ (A, 10, 2), (B, 3, 4), (C, 6, 7) ]
  • 57. CounterColumn write -after A is leader [ (A, 10 + 2, 2 + 1), (B, 3, 4), (C, 6, 7) ]
  • 58. CounterColumn Read All Memtables and SSTables are read through using following algorithm: All tuples with local replicaId will be summarized, tuple with maximum logical clock value will be chosen for foreign replica. Counters of foreign replicas are updated during read repair , during replicate on write procedure or by AES
  • 59. CounterColumn read - example Memtable - (A, 12, 4) (B, 3, 5) (C, 10, 3) SSTable1 – (A, 5, 3) (B, 1, 6) (C, 5, 4) SSTable2 – (A, 2, 2) (B, 2, 4) (C, 6, 2) Result: (A, 19, 9) + (B, 1,6) + (C, 5, 4) =19 + 1 + 5 = 25
  • 60. Resources Home of Apache Cassandra Project http://cassandra.apache.org/ Apache Cassandra Wiki http://wiki.apache.org/cassandra/ Documentation provided by DataStaxhttp://www.datastax.com/docs/0.8/ Good explanation of creation secondary indexes http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9
  • 61. Authors Lev Sivashov- lsivashov@gmail.com Andrey Lomakin - lomakin.andrey@gmail.com, twitter: @Andrey_LomakinLinkedIn: http://www.linkedin.com/in/andreylomakin Artem Orobets – enisher@gmail.comtwitter: @Dr_EniSh Anton Veretennik - tennik@gmail.com