SlideShare a Scribd company logo
1 of 48
Cassandra Basics
              Indexing

     Benjamin Black, b@b3k.us
Relational stores are
SCHEMA ORIENTED
Start from your SCHEMA &
WORK FORWARDS
Column stores are
QUERY ORIENTED
Start from your QUERIES &
WORK BACKWARDS
AT SCALE
AT SCALE
           Denormalization is
              THE NORM
AT SCALE
AT SCALE
           Everything depends on
               THE INDICES
Cassandra is an
INDEX CONSTRUCTION KIT
Column Family
Two-level Map

key: {
  column: value,
  column: value,
  ...
 }
Super Column Family
Three-level Map
key: {
   supercolumn: {
       column:value,
      column: value
   },
   supercolumn: {
     ...
   }
 }
column sorting defined by
         CompareWith/
CompareSubcolumnsWith
TimeUUIDType
  UTF8Type
                ASCIIType
LongType

     LexicalUUIDType
row placement determined by
             Partitioner
RandomPartitioner
Place based on MD5 of key




        OrderPreservingPartitioner
               Place based on actual key
Rows are sorted by key on each node
Regardless of partitioner
One example in
TWO ACTS
Prelude
A USER DATABASE
<ColumnFamily Name=”Users”
       CompareWith=”UTF8Type” />
“b”:    {“name”:”Ben”, “street”:”1234 Oak St.”,
        “city”:”Seattle”, “state”:”WA”}
“jason”: {”name”:”Jason”, “street”:”456 First Ave.”,
        “city”:”Bellingham”, “state”:”WA”}
“zack”:     {”name”: “Zack”, “street”: “4321 Pine St.”,
          “city”: “Seattle”, “state”: “WA”}
“jen1982”: {”name”:”Jennifer”, “street”:”1120 Foo Lane”,
         “city”:”San Francisco”, “state”:”CA”}
“albert”: {”name”:”Albert”, “street”:”2364 South St.”,
         “city”:”Boston”, “state”:”MA”}
SELECT name FROM Users
WHERE state=”WA”
SELECT name FROM Users
               WHERE state=”WA”

How is WHERE clause
formed?
Act One
Supercolumn Indexing
<ColumnFamily Name=”LocationUserIndexSCF”
       CompareWith=”UTF8Type”
       CompareSubcolumnsWith=”UTF8Type”
       ColumnType=”Super” />
[state]: {
  [city1]: {[name1]:[user1], [name2]:[user2], ... },
  [city2]: {[name3]:[user3], [name4]:[user4], ... },
  ...
  [cityX]: {[name5]:[user5], [name6]:[user6], ... }
}
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key


“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                 Super Column

“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                                     Colum
                 Super Column
                                     n
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                                     Colum
                 Super Column                Value
                                     n
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Show me
EVERYONE IN WASHINGTON
get(:LocationUserIndexSCF, ‘WA’)
{

   “Bellingham”: {”Jason”: “jason”},

   “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Act Two
Composite Key Indexing
Order Preserving Partitioner
                          +
        Range Queries
<ColumnFamily Name=”LocationUserIndexCF”
       CompareWith=”UTF8Type” />
[state1]/[city1]:   {[name1]:[user1], [name2]:[user2], ... }
[state1]/[city2]:   {[name3]:[user3], [name4]:[user4], ... }
[state2]/[city1]:   {[name5]:[user5], [name6]:[user6], ... }
...
[stateX]/[cityY]:   {[name7]:[user7], [name8]:[user8], ... }
“CA/San Francisco”: {”Jennifer”: “jen1982”}
“MA/Boston”: {”Albert”: “albert”}
“WA/Bellingham”: {”Jason”: “jason”}
“WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
Show me
EVERYONE IN WASHINGTON
get_range(:LocationUserIndexCF, {:start: 'WA',
                          :finish:'WB'})
{
    ”WA/Bellingham”: {”Jason”: “jason”},
    “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
}
Finale
BUILD SOMETHING AWESOME
(This part is up to you)
Appendix
EXAMPLE KEYSPACE
<Keyspace Name="UserDb">
  <ColumnFamily Name="Users"
          CompareWith="UTF8Type" />

  <ColumnFamily Name="LocationUserIndexSCF"

   
         CompareWith="UTF8Type"
  
       
     CompareSubcolumnsWith="UTF8Type"

   
         ColumnType="Super" />

   
  <ColumnFamily Name="LocationUserIndexCF"

   
         CompareWith="UTF8Type" />

   
  <ReplicaPlacementStrategy>
      org.apache.cassandra.locator.RackUnawareStrategy
  </ReplicaPlacementStrategy>
  <ReplicationFactor>1</ReplicationFactor>
  <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>

More Related Content

Viewers also liked

Understanding BYOE and How Today's User Experience Drives Value for UC
Understanding BYOE and How Today's User Experience Drives Value for UCUnderstanding BYOE and How Today's User Experience Drives Value for UC
Understanding BYOE and How Today's User Experience Drives Value for UCShoreTel
 
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...aliproductninja
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark datastaxjp
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & JupyterRaj Singh
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Juan Pedro Moreno
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - DenverJon Haddad
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLRyu Kobayashi
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Jon Haddad
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonMiklos Christine
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best FriendsJon Haddad
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Jon Haddad
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 

Viewers also liked (20)

Cassandra
CassandraCassandra
Cassandra
 
Graphite cluster setup blueprint
Graphite cluster setup blueprintGraphite cluster setup blueprint
Graphite cluster setup blueprint
 
Understanding BYOE and How Today's User Experience Drives Value for UC
Understanding BYOE and How Today's User Experience Drives Value for UCUnderstanding BYOE and How Today's User Experience Drives Value for UC
Understanding BYOE and How Today's User Experience Drives Value for UC
 
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
 
What is a DMP
What is a DMPWhat is a DMP
What is a DMP
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 

Similar to Cassandra Basics: Indexing

Building Your First Java Application with MongoDB
Building Your First Java Application with MongoDBBuilding Your First Java Application with MongoDB
Building Your First Java Application with MongoDBMongoDB
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL UsersAll Things Open
 
MongoDB - Features and Operations
MongoDB - Features and OperationsMongoDB - Features and Operations
MongoDB - Features and Operationsramyaranjith
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL UsersGreat Wide Open
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolatorMichael Limansky
 
Native json in the Cache' ObjectScript 2016.*
Native json in the Cache' ObjectScript 2016.*Native json in the Cache' ObjectScript 2016.*
Native json in the Cache' ObjectScript 2016.*Timur Safin
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 

Similar to Cassandra Basics: Indexing (10)

Building Your First Java Application with MongoDB
Building Your First Java Application with MongoDBBuilding Your First Java Application with MongoDB
Building Your First Java Application with MongoDB
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 
MongoDB - Features and Operations
MongoDB - Features and OperationsMongoDB - Features and Operations
MongoDB - Features and Operations
 
Json at work overview and ecosystem-v2.0
Json at work   overview and ecosystem-v2.0Json at work   overview and ecosystem-v2.0
Json at work overview and ecosystem-v2.0
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
Native json in the Cache' ObjectScript 2016.*
Native json in the Cache' ObjectScript 2016.*Native json in the Cache' ObjectScript 2016.*
Native json in the Cache' ObjectScript 2016.*
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 

Recently uploaded

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Cassandra Basics: Indexing

  • 1. Cassandra Basics Indexing Benjamin Black, b@b3k.us
  • 3. Start from your SCHEMA & WORK FORWARDS
  • 5. Start from your QUERIES & WORK BACKWARDS
  • 7. AT SCALE Denormalization is THE NORM
  • 9. AT SCALE Everything depends on THE INDICES
  • 10. Cassandra is an INDEX CONSTRUCTION KIT
  • 12. Two-level Map key: { column: value, column: value, ... }
  • 14. Three-level Map key: { supercolumn: { column:value, column: value }, supercolumn: { ... } }
  • 15. column sorting defined by CompareWith/ CompareSubcolumnsWith
  • 16. TimeUUIDType UTF8Type ASCIIType LongType LexicalUUIDType
  • 17. row placement determined by Partitioner
  • 18. RandomPartitioner Place based on MD5 of key OrderPreservingPartitioner Place based on actual key
  • 19. Rows are sorted by key on each node Regardless of partitioner
  • 22. <ColumnFamily Name=”Users” CompareWith=”UTF8Type” />
  • 23. “b”: {“name”:”Ben”, “street”:”1234 Oak St.”, “city”:”Seattle”, “state”:”WA”} “jason”: {”name”:”Jason”, “street”:”456 First Ave.”, “city”:”Bellingham”, “state”:”WA”} “zack”: {”name”: “Zack”, “street”: “4321 Pine St.”, “city”: “Seattle”, “state”: “WA”} “jen1982”: {”name”:”Jennifer”, “street”:”1120 Foo Lane”, “city”:”San Francisco”, “state”:”CA”} “albert”: {”name”:”Albert”, “street”:”2364 South St.”, “city”:”Boston”, “state”:”MA”}
  • 24. SELECT name FROM Users WHERE state=”WA”
  • 25. SELECT name FROM Users WHERE state=”WA” How is WHERE clause formed?
  • 27. <ColumnFamily Name=”LocationUserIndexSCF” CompareWith=”UTF8Type” CompareSubcolumnsWith=”UTF8Type” ColumnType=”Super” />
  • 28. [state]: { [city1]: {[name1]:[user1], [name2]:[user2], ... }, [city2]: {[name3]:[user3], [name4]:[user4], ... }, ... [cityX]: {[name5]:[user5], [name6]:[user6], ... } }
  • 29. “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 30. Row Key “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 31. Row Key Super Column “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 32. Row Key Colum Super Column n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 33. Row Key Colum Super Column Value n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 34. Show me EVERYONE IN WASHINGTON
  • 36. { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 38. Order Preserving Partitioner + Range Queries
  • 39. <ColumnFamily Name=”LocationUserIndexCF” CompareWith=”UTF8Type” />
  • 40. [state1]/[city1]: {[name1]:[user1], [name2]:[user2], ... } [state1]/[city2]: {[name3]:[user3], [name4]:[user4], ... } [state2]/[city1]: {[name5]:[user5], [name6]:[user6], ... } ... [stateX]/[cityY]: {[name7]:[user7], [name8]:[user8], ... }
  • 41. “CA/San Francisco”: {”Jennifer”: “jen1982”} “MA/Boston”: {”Albert”: “albert”} “WA/Bellingham”: {”Jason”: “jason”} “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
  • 42. Show me EVERYONE IN WASHINGTON
  • 44. { ”WA/Bellingham”: {”Jason”: “jason”}, “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”} }
  • 46. (This part is up to you)
  • 48. <Keyspace Name="UserDb"> <ColumnFamily Name="Users" CompareWith="UTF8Type" /> <ColumnFamily Name="LocationUserIndexSCF" CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" ColumnType="Super" /> <ColumnFamily Name="LocationUserIndexCF" CompareWith="UTF8Type" /> <ReplicaPlacementStrategy> org.apache.cassandra.locator.RackUnawareStrategy </ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> </Keyspace>

Editor's Notes