SlideShare une entreprise Scribd logo
1  sur  14
Apache Cassandra
What is Apache Cassandra?
Apache Cassandra is an open source non relational distributed
database that manages large amounts of data across commodity
servers.
It is column oriented database.
It was initially released in July 2008.
It comes under Availability and Partition Tolerance.
Why Apache Cassandra was implemented?
Avinash Lakshman and Prashant Malik initially
developed Apache Cassandra at Facebook to power the
Facebook inbox search feature.
Components of Apache Cassandra
• Node: A Cassandra node is a place where data is stored.
• Data center: Data center is a collection of related nodes.
• Cluster: A cluster is a component which contains one or more data centers.
• Commit log: In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is
written to the commit log.
• Memtable: A memtable is a memory-resident data structure. After commit log, the data will be written
to the mem-table. Sometimes, for a single-column family, there will be multiple memtables.
• SSTable: It is a disk file to which the data is flushed from the memtable when its contents reach a
threshold value.
• Bloom filter: These are nothing but quick, nondeterministic, algorithms for testing whether an element
is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
Apache Cassandra Architecture:
Write Operations:
i. Cassandra stores the data in memory structure in memtable(RAM)
when the initial write request is generated from the client.
Concurrently the writes are written on Commit log(disk)as well
which are permanent even if the light goes off for the node.
ii. The data from the memtable(RAM) is flushed to the SSTables(Disk)
and the partition index is also created that points to the location of
data in the disk. The flushing of data from memtable(RAM) to
SSTables(Disk) is done using the configurable threshold or when the
commit log threshold commitlog_total_space_in_mb is exceeded.
iii. The Data is written on the SSTables tables which are immutable
which means when the memtable is flushed the data is not
overwritten in SSTables despite a new file being created. The
partitions are stored on multiple SSTables so that they can be easily
searched.
Read Operations:
i. The Read request will be made from the client.
ii. The request data will be checked in the memtable(RAM). If the
requested data is present then data will be read from memtable(RAM)
and merged with SSTables(DISK) files to send final data to the client.
iii. If the row cache is enabled then it will be checked to find the data.
iv. Bloom Filters are loaded in the Heap memory that will be checked to
find out the SSTables file that can store the requested partition data.
Since Bloom Filters works on probabilistic function and can return false
positives. In some cases Bloom Filters does not return the SSTable file
then Cassandra further checks in the partition key cache.
v. Partition Key Cache is used to store the partition index in heap memory
and the partition index of data will be searched in that. If the Partition
Key is present in the Partition Key Cache then Cassandra will go to
compression offset to find the Disk that has the data. If the Partition Key
is not present in the Partition Key Cache then the partition summary is
searched to find user-requested data.
vi. Partition Index is used to store the Partition key of the data that will
be used in the Compression offset map to find out the exact location
of the Disk which has stored the data.
vii. Compression offset map is used to hold the exact location of data. It
uses the Partition key to locate that. Once the Compression offset
map indicates the location where data is stored the further process is
to fetch the data and share it with the user.
Features of Apache Cassandra:
Distributive
Scalability
Fault Tolerance
Query Language
Virtual Nodes:
A virtual node is the data storage layer within a server. There are
256 virtual nodes per server by default. Each node has a range of
tokens assigned. Every virtual node uses a sub-range of tokens from
the node they belong to. These virtual nodes provide greater
flexibility in the system. Consequently, It is easier for Cassandra to
add new nodes to the cluster when we need them. When our data
has unequally distributed tokens between nodes, we can easily
extend the storage capacity by extending virtual nodes to the more
loaded node.
Advantages of Apache Cassandra:
Open source
Peer to Peer Architecture
Scalable
High Efficiency
Consistency adjustable
Schema Less
Easy to Learn and Use
Distributed and Decentralized
Ability to Analyse
Disadvantages of Apache Cassandra:
It does not support ACID and relational data properties.
Because it handles large amounts of data and many requests,
transactions slow down, meaning you get latency issues.
Data is modelled around queries and not structure, resulting in the
same information stored multiple times.
Since Cassandra stores vast amounts of data, users may experience
JVM memory management issues.
It offers no join or subquery support.
Cassandra does not support aggregates
Cassandra was optimized from the start for fast writes, reading got
the short end of the stick, so it tends to be slower.
Finally, it was lacks official documentation from Apache, so you need
to look for it among third party companies.

Contenu connexe

Similaire à Apache Cassandra.pptx

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
Research and prepare a presentation to demonstrate a scenario in whi.pdf
Research and prepare a presentation to demonstrate a scenario in whi.pdfResearch and prepare a presentation to demonstrate a scenario in whi.pdf
Research and prepare a presentation to demonstrate a scenario in whi.pdf
eyevisioncare1
 

Similaire à Apache Cassandra.pptx (20)

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Virtual SAN- Deep Dive Into Converged Storage
Virtual SAN- Deep Dive Into Converged StorageVirtual SAN- Deep Dive Into Converged Storage
Virtual SAN- Deep Dive Into Converged Storage
 
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Cassandra admin
Cassandra adminCassandra admin
Cassandra admin
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Research and prepare a presentation to demonstrate a scenario in whi.pdf
Research and prepare a presentation to demonstrate a scenario in whi.pdfResearch and prepare a presentation to demonstrate a scenario in whi.pdf
Research and prepare a presentation to demonstrate a scenario in whi.pdf
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
Overkill Security
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 

Dernier (20)

Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 

Apache Cassandra.pptx

  • 1. Apache Cassandra What is Apache Cassandra? Apache Cassandra is an open source non relational distributed database that manages large amounts of data across commodity servers. It is column oriented database. It was initially released in July 2008. It comes under Availability and Partition Tolerance.
  • 2. Why Apache Cassandra was implemented? Avinash Lakshman and Prashant Malik initially developed Apache Cassandra at Facebook to power the Facebook inbox search feature.
  • 3. Components of Apache Cassandra • Node: A Cassandra node is a place where data is stored. • Data center: Data center is a collection of related nodes. • Cluster: A cluster is a component which contains one or more data centers. • Commit log: In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is written to the commit log. • Memtable: A memtable is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple memtables. • SSTable: It is a disk file to which the data is flushed from the memtable when its contents reach a threshold value. • Bloom filter: These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
  • 5. Write Operations: i. Cassandra stores the data in memory structure in memtable(RAM) when the initial write request is generated from the client. Concurrently the writes are written on Commit log(disk)as well which are permanent even if the light goes off for the node. ii. The data from the memtable(RAM) is flushed to the SSTables(Disk) and the partition index is also created that points to the location of data in the disk. The flushing of data from memtable(RAM) to SSTables(Disk) is done using the configurable threshold or when the commit log threshold commitlog_total_space_in_mb is exceeded. iii. The Data is written on the SSTables tables which are immutable which means when the memtable is flushed the data is not overwritten in SSTables despite a new file being created. The partitions are stored on multiple SSTables so that they can be easily searched.
  • 6.
  • 7. Read Operations: i. The Read request will be made from the client. ii. The request data will be checked in the memtable(RAM). If the requested data is present then data will be read from memtable(RAM) and merged with SSTables(DISK) files to send final data to the client. iii. If the row cache is enabled then it will be checked to find the data. iv. Bloom Filters are loaded in the Heap memory that will be checked to find out the SSTables file that can store the requested partition data. Since Bloom Filters works on probabilistic function and can return false positives. In some cases Bloom Filters does not return the SSTable file then Cassandra further checks in the partition key cache. v. Partition Key Cache is used to store the partition index in heap memory and the partition index of data will be searched in that. If the Partition Key is present in the Partition Key Cache then Cassandra will go to compression offset to find the Disk that has the data. If the Partition Key is not present in the Partition Key Cache then the partition summary is searched to find user-requested data.
  • 8. vi. Partition Index is used to store the Partition key of the data that will be used in the Compression offset map to find out the exact location of the Disk which has stored the data. vii. Compression offset map is used to hold the exact location of data. It uses the Partition key to locate that. Once the Compression offset map indicates the location where data is stored the further process is to fetch the data and share it with the user.
  • 9. Features of Apache Cassandra: Distributive Scalability Fault Tolerance Query Language
  • 10. Virtual Nodes: A virtual node is the data storage layer within a server. There are 256 virtual nodes per server by default. Each node has a range of tokens assigned. Every virtual node uses a sub-range of tokens from the node they belong to. These virtual nodes provide greater flexibility in the system. Consequently, It is easier for Cassandra to add new nodes to the cluster when we need them. When our data has unequally distributed tokens between nodes, we can easily extend the storage capacity by extending virtual nodes to the more loaded node.
  • 11.
  • 12. Advantages of Apache Cassandra: Open source Peer to Peer Architecture Scalable High Efficiency Consistency adjustable Schema Less Easy to Learn and Use Distributed and Decentralized Ability to Analyse
  • 13. Disadvantages of Apache Cassandra: It does not support ACID and relational data properties. Because it handles large amounts of data and many requests, transactions slow down, meaning you get latency issues. Data is modelled around queries and not structure, resulting in the same information stored multiple times. Since Cassandra stores vast amounts of data, users may experience JVM memory management issues. It offers no join or subquery support. Cassandra does not support aggregates Cassandra was optimized from the start for fast writes, reading got the short end of the stick, so it tends to be slower.
  • 14. Finally, it was lacks official documentation from Apache, so you need to look for it among third party companies.