SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
An overview of
                       Neo4j Internals


                                   tobias@neotechnology.com
 Tobias Lindaaker                  twitter: @thobe, #neo4j (@neo4j)
                                   web: neo4j.org neotechnology.com
 Hacker @ Neo Technology           my web: thobe.org




Monday, May 21, 2012
Outline
       This is a rough structure of
       how the pieces of Neo4j fit
       together.

       This talk will not cover
       how disks/fs works, we
       just assume it does.           Traversals      Core API       Cypher
       Nor will it cover the “Core
       API”, you are assumed to
       know it.
                                      Node/Relationship
                                                            Thread local diffs
                                        Object cache

                                          FS Cache                         HA


                                        Record files          Transaction log


                                                      Disk(s)



                                                                                 2

Monday, May 21, 2012
Outline

                             Traversals      Core API       Cypher


                             Node/Relationship
                                                   Thread local diffs
                               Object cache

                                 FS Cache                         HA

      Let’s start at the
      bottom: the on disk      Record files          Transaction log
      storage file layout.



                                             Disk(s)



                                                                        3

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                linked lists of fixed size records on disk.

       Your graph on disk                       Properties are stored as a linked list of
                                                property records, each holding key+value.
                                                Each node/relationship references its first
                                                property record.
                                                The Nodes also reference the first node in its
                                                relationship chain.
                                                Each Relationship references its start and
                   Name: Alistair               end node.
                   Age: 34             KNOWS    It also references the prev/next relationship
                                                record for the start/end node respectively


                                                                   Name: Tobias
                                                                   Age: 27
                                                                   Nationality: Swedish



            KNOWS
                                        KNOWS


                                                                      KNOWS
              Name: Ian
              Age: 42


                                                           Name: Jim
                                                           Age: 37
                                    KNOWS                  Stuff: good

                                                                                                 4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                         linked lists of fixed size records on disk.

       Your graph on disk                Properties are stored as a linked list of
                                         property records, each holding key+value.
                                         Each node/relationship references its first
        Name                             property record.
                                         The Nodes also reference the first node in its
       Alistair                          relationship chain.
                                         Each Relationship references its start and
                                         end node.                                        Name
                                KNOWS    It also references the prev/next relationship
                                         record for the start/end node respectively       Tobias
      Age
       34
                                                                                                        Age
                                                                                                            27


                                                                                          Nationality

             KNOWS                                                                        Swedish
                                 KNOWS


                                                               KNOWS

                                                                               Name
                                                                                 Jim
                                                                                                            Age
     Name                                                                                                   37
                                                                                 Stuff
       Ian                   KNOWS
                       Age                                                      good

                       42                                                                               4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                         linked lists of fixed size records on disk.

       Your graph on disk                Properties are stored as a linked list of
                                         property records, each holding key+value.
                                         Each node/relationship references its first
        Name                             property record.
                                         The Nodes also reference the first node in its
       Alistair                          relationship chain.
                                         Each Relationship references its start and
                                         end node.                                        Name
                                KNOWS    It also references the prev/next relationship
                                         record for the start/end node respectively       Tobias
      Age
       34
                                                                                                        Age
                                                                                                            27


                                                                                          Nationality

             KNOWS                                                                        Swedish
                                 KNOWS


                                                               KNOWS

                                                                               Name
                                                                                 Jim
                                                                                                            Age
     Name                                                                                                   37
                                                                                 Stuff
       Ian                   KNOWS
                       Age                                                      good

                       42                                                                               4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Store files
                       ๏Node store
                       ๏Relationship store
                         •   Relationship type store
                       ๏Property store
                         •   Property key store

                         • (long) String store    Short string and
                                                  array values are
                                                  inlined in the




                         •
                                                  property store, long
                                                  values are stored in

                           (long) Array store     separate store files.


                                                                          5

Monday, May 21, 2012
Neo4j Storage Record Layout
Node (9 bytes)
inUse nextRelId                          nextPropId


     1                               5                9



Relationship (33 bytes)
inUse firstNode                          secondNode       relationshipType    firstPrevRelId    firstNextRelId    secondPrevRelId    secondNextRelId    nextPropId


     1                               5                9                      13                17                21                 25                 29            33



Relationship Type (5 bytes)
inUse typeBlockId


     1                               5



Property (33 bytes)
inUse type              keyIndexId       propBlock                                                                                                      nextPropId


     1              3                5                                                                                                                 29            33



Property Index (9 bytes)
inUse propCount                          keyBlockId


     1                               5                9



Dynamic Store (125 bytes)
inUse next                               data


     1                               5



NeoStore (5 bytes)
inUse datum


     1                               5

Monday, May 21, 2012
Outline

                                      Traversals      Core API       Cypher


     Next: The t wo levels of cache   Node/Relationship
     in Neo4j.                                              Thread local diffs
     The low level FS Cache for the     Object cache
     record files.
     And the high level Object
     cache storing a structure
     more optimized for traversal.
                                          FS Cache                         HA


                                        Record files          Transaction log


                                                      Disk(s)



                                                                                 7

Monday, May 21, 2012
The caches
        ๏ Filesystem cache:
                • Caches regions store file intofiles sized regions)
                    (divides each
                                   of the store
                                                  equally

                • The cache holds a fixed number of regions for each file
                • Regions are evicted based on ahit in non-cached region)
                    (hit count vs. miss count, i.e.
                                                    LFU-like policy


                • Default implementation of regions uses OS mmap
        ๏ Node/Relationship cache
                • Cache a version more optimized for traversals
                                                                            8

Monday, May 21, 2012
What we put in cache
                       ID             Relationship ID refs
                                      in:    R1       R2    ...   Rn               The structure of the elements in the high level
                       type 1                                                      object cache.
                                      out    R1       R2    ...   Rn
                                                                                   On disk most of the information is contained
                                      in:    R1       R2    R3     ...    Rn       in the relationship records, with the nodes just
                       type 2                                                      referencing their first relationship. In the
                                      out    R1       ...   Rn
              Node                                                                 cache this is turned around: the nodes hold
                                                                                   references to all its relationships. The
                        ...            (grouped by type)                           relationships are simple, only holding its
                                                                                   properties.

                                                                                   The relationships for each node is grouped by
                                                                                   RelationshipType to allow fast traversal of a
                                                                                   specific type.
                       Key 1                Key 2           ...          Key n
                                                                                   All references (dotted arrows) are by ID, and
                                                                                   traversals do indirect lookup through the cache.
                              Val 1



                                              Val 2




                       ID         start               end         type     Val n

        Relationship
                       Key 1                Key 2           ...          Key n
                              Val 1



                                              Val 2




                                                                           Val n




                                                                                                                       9

Monday, May 21, 2012
Outline

     So how do traversals work...
                                    Traversals      Core API       Cypher


                                    Node/Relationship
                                                          Thread local diffs
                                      Object cache

                                        FS Cache                         HA


                                      Record files          Transaction log


                                                    Disk(s)



                                                                               10

Monday, May 21, 2012
Traversals - how do they work?
       ๏ RelationshipExpanders: given (a path to) a node, returns           The surface
                                                                            layer, the you
                 Relationships to continue traversing from that node        interact with.


       ๏ Evaluators: given (a path to) a node, returns whether to:
                • Continue traversing on that branch (i.e. expand) or not
                • Include (the path to) the node in the result set or not
       ๏ Then a projection to Path, Node, or Relationship applied to
                 each Path in the result set.
            ... but also:
       ๏ Uniqueness level: policy for when it is ok to revisit a node
                 that has already been visited
       ๏ Implemented on top of the Core API
                                                                            11

Monday, May 21, 2012
More on Traversals
        ๏ Fetch node data from cache - non-blocking access                  This is what happens


                •
                                                                            under the hood.
                       If not in cache, retrieve from storage, into cache
                       ‣If region is in FS cache: blocking but short duration access
                       ‣If region is outside FS cache: blocking slower access
        ๏ Get relationships from cached node
                • If not fetched, retrieve from storage, by following chains
        ๏ Expand relationship(s) to end up on next node(s)
                • The relationship knows the node, no need to fetch it yet
        ๏ Evaluate
                •      possibly emitting a Path into the result set
        ๏ Repeat                                                                 12

Monday, May 21, 2012
Outline

                                                                  How is Cypher different?
                       Traversals      Core API       Cypher      and how dowes it work?



                       Node/Relationship
                                             Thread local diffs
                         Object cache

                           FS Cache                         HA


                         Record files          Transaction log


                                       Disk(s)



                                                                              13

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
What about gremlin?
        ๏ gremlin is a third party language, built by Marko Rodriguez of
                   Tinkerpop (a group of people who like to hack on graphs)
        ๏ Originally based on the idea of using xpath to describe traversals:
                   ./HAS_CART/CONTAINS_ITEM/PURCHASED/PURCHASED
                   but bastardized to distinguish between nodes and relationships:
                   ./outE[label=HAS_CART]/inV                     Traversals are close
                    /outE[label=CONTAINS_ITEM]/inV to xpath, which is
                                                                  why xpath like
                    /inE[label=PURCHASED]/outV                    descriptions of
                                                                  traversals seemed
                    /outE[label=PURCHASED]/inV                    like a good idea.

        ๏ xpath is not complete enough to express full algorithms, it needs a
                   host language, gremlin originally defined its own.
                   This changed Groovy as a more complete host language and
                   abandoned xpath in favor of method chaining
                   [ replace ‘/’ with ‘.’ ]                              15

Monday, May 21, 2012
Gremlin compared to Cypher
        ๏start me=node:people(name={myname})
              match me-[:HAS_CART]->cart-[:CONTAINS_ITEM]->item
              item<-[:PURCHASED]-user-[:PURCHASED]->recommendation
              return recommendation

        ๏ Cypher is declarative, describes what data to get - its shape
        ๏ Gremlin is imperative, prescribes how to get the data
        ๏ Cypher has more opportunities for optimization by the engine
        ๏ Gremlin can implement pagerank, Cypher can’t (yet?)



                                                                     16

Monday, May 21, 2012
Outline

                       Traversals      Core API       Cypher      Transactions involve t wo
                                                                  parts:
                                                                  The (thread local) changes
                                                                  being done by an active
                                                                  transaction,
                       Node/Relationship                          and the transaction replay log
                                             Thread local diffs
                         Object cache                             for recovery.



                           FS Cache                         HA


                         Record files          Transaction log


                                       Disk(s)



                                                                               17

Monday, May 21, 2012
Transaction Isolation
        ๏ Mutating operations are not written when performed
        ๏ They are stored in a thread confined transaction state object
        ๏ This prevents other threads from seeing uncommitted changes
                   from the transactions of other threads
        ๏ When Transaction.finish() is invoked the transaction is either
                   committed or rolled back
        ๏ Rollback is simple: discard the transaction state object



                                                                          18

Monday, May 21, 2012
Transactions & Durability
        ๏ Commit is:
                • Changes made in the transaction are collected as commands
                • Commands are sorted to get predictable update order
                       ‣This prevents concurrent readers from seeing inconsistent
                        data when the changes are applied to the store
                • Write changes (in sorted order) to the transaction log
                • Mark the transaction as committed in the log
                • Apply the changes (in sorted order) to the store files
                                                                              19

Monday, May 21, 2012
Recovery
        ๏ Transaction commands dictate state, they don’t modify state
                • i.e. SET property "count" to 5
                • rather than ADD 1 to property "count"
        ๏ Thus: Applying the same command twice yields the same state
        ๏ Recovery simply replays all transactions since the last safe point
        ๏ If tx A mutates node1.name, then tx B also mutates
                   node1.name that doesn’t matter, because the database is not
                   recovered until all transactions have been replayed


                                                                         20

Monday, May 21, 2012
Outline

                       Traversals      Core API       Cypher


                       Node/Relationship
                                             Thread local diffs
                         Object cache

                           FS Cache                         HA


                         Record files          Transaction log     High Availability in
                                                                  Neo4j builds on top of
                                                                  the transaction replay


                                       Disk(s)



                                                                              21

Monday, May 21, 2012
Outline                       The transaction logs are
                                                                          shared bet ween all instances
                                                                          in an High Availability setup,
                                                                          all other parts operate on the
                                                                          local data just like in the
                                                                          standalone case.

                               Traversals      Core API       Cypher


                       Local   Node/Relationship
                                                     Thread local diffs
                                 Object cache

                                   FS Cache                         HA


                                 Record files          Transaction log     Shared

                                               Disk(s)



                                                                                       22

Monday, May 21, 2012
• HA - the parts to it:
        ๏ Based on streaming transactions between servers
        ๏ All transactions are committed through the master
                • Then (eventually) applied to the slaves
                • Eventuality synchronizationupdate intervalby interaction
                    or when
                              defined by the
                                              is mandated
        ๏ When writing to a slave:
                • Locks coordinated through the master
                • Transaction data buffered on theget a txid
                    applied first on the master to
                                                   slave

                       then applied with the same txid on the slave
                                                                             23

Monday, May 21, 2012
Creating new Nodes and Relationships
        ๏ New Nodes/Relationships don’t need locks, so they don’t need a
                   transaction synced with master until the transaction commit
        ๏ They do need an ID that is unique and equal among all instances
        ๏ Each instance allocates IDs in blocks from the master, then assigns
                   them to new Nodes/Relationships locally

                • This batch allocation can be seen in (Enterprise) 1000 as
                    Node/Relationship counts jumping in steps of
                                                                    WebAdmin




                                                                           24

Monday, May 21, 2012
HA synchronization points
        ๏ Transactions are uniquely identified by monotonically increasing ID
        ๏ All Requests from slave send the current latest txid on that slave
        ๏ Responses from master send back a stream of transactions that
                   have occurred since then, along with the actual result
        ๏ Transaction is replayed just like when committed / recovered
        ๏ Nodes/Relationships touched during this application are evicted
                   from cache to avoid cache staleness
        ๏ Transaction commands are only sorted when created, before
                   stored/transmitted, thus consistency is preserved during all
                   application phases
                                                                              25

Monday, May 21, 2012
Locking semantics of HA
        ๏ To be granted a lock the slave must have the latest version of the
                   Node/Relationship it is trying to lock

                • This ensures consistency
                • The implementation of “Latest version ofentireNode/
                    Relationship” is “Latest version of the
                                                            the
                                                                 graph”

                • The slave must thus sync transactions from the master


                                                                          26

Monday, May 21, 2012
Master election
        ๏ Each instance communicates/coordinates:
              • its latest transaction id (including the master id for that tx)
              • the id forclock value for when the txid was written
                            that instance
              • (logical)
        ๏    Election chooses:
           1. The instance with highest txid
           2. IF multiple: The instance that was master for that tx
           3. IF unavailable: The instance with the lowest clock value
           4. IF multiple: The instance with the lowest id
        ๏ Election happens when the current master cannot be reached
                •
              Any instance can choose to re-elect
                •
              Each instance runs the election protocol individually
                •
              Notify others when election chooses new master
        ๏ When elected, the new master broadcasts to all instances, 27
             forcing them to bind to the new master
Monday, May 21, 2012
Thank you for listening!

                                    tobias@neotechnology.com
 Tobias Lindaaker                   twitter: @thobe, #neo4j (@neo4j)
                                    web: neo4j.org neotechnology.com
 Hacker @ Neo Technology            my web: thobe.org




Monday, May 21, 2012

Contenu connexe

Tendances

The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesDataStax
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceNeo4j
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to CypherNeo4j
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewNeo4j
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture ForumChristopher Spring
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Daniel Hochman
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 

Tendances (20)

ClickHouse Intro
ClickHouse IntroClickHouse Intro
ClickHouse Intro
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data Science
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Map reduce vs spark
Map reduce vs sparkMap reduce vs spark
Map reduce vs spark
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 

Plus de Tobias Lindaaker

Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL databaseTobias Lindaaker
 
[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent ProgrammingTobias Lindaaker
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
 
Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jTobias Lindaaker
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVMTobias Lindaaker
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVMTobias Lindaaker
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesTobias Lindaaker
 

Plus de Tobias Lindaaker (9)

NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
JDK Power Tools
JDK Power ToolsJDK Power Tools
JDK Power Tools
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
 
[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks ass
 
Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4j
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic Languages
 

Dernier

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Neo4j Internals Overview

  • 1. An overview of Neo4j Internals tobias@neotechnology.com Tobias Lindaaker twitter: @thobe, #neo4j (@neo4j) web: neo4j.org neotechnology.com Hacker @ Neo Technology my web: thobe.org Monday, May 21, 2012
  • 2. Outline This is a rough structure of how the pieces of Neo4j fit together. This talk will not cover how disks/fs works, we just assume it does. Traversals Core API Cypher Nor will it cover the “Core API”, you are assumed to know it. Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 2 Monday, May 21, 2012
  • 3. Outline Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Let’s start at the bottom: the on disk Record files Transaction log storage file layout. Disk(s) 3 Monday, May 21, 2012
  • 4. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first property record. The Nodes also reference the first node in its relationship chain. Each Relationship references its start and Name: Alistair end node. Age: 34 KNOWS It also references the prev/next relationship record for the start/end node respectively Name: Tobias Age: 27 Nationality: Swedish KNOWS KNOWS KNOWS Name: Ian Age: 42 Name: Jim Age: 37 KNOWS Stuff: good 4 Monday, May 21, 2012
  • 5. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name property record. The Nodes also reference the first node in its Alistair relationship chain. Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 Nationality KNOWS Swedish KNOWS KNOWS Name Jim Age Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 6. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name property record. The Nodes also reference the first node in its Alistair relationship chain. Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 Nationality KNOWS Swedish KNOWS KNOWS Name Jim Age Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 7. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 8. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 9. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 10. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 11. Store files ๏Node store ๏Relationship store • Relationship type store ๏Property store • Property key store • (long) String store Short string and array values are inlined in the • property store, long values are stored in (long) Array store separate store files. 5 Monday, May 21, 2012
  • 12. Neo4j Storage Record Layout Node (9 bytes) inUse nextRelId nextPropId 1 5 9 Relationship (33 bytes) inUse firstNode secondNode relationshipType firstPrevRelId firstNextRelId secondPrevRelId secondNextRelId nextPropId 1 5 9 13 17 21 25 29 33 Relationship Type (5 bytes) inUse typeBlockId 1 5 Property (33 bytes) inUse type keyIndexId propBlock nextPropId 1 3 5 29 33 Property Index (9 bytes) inUse propCount keyBlockId 1 5 9 Dynamic Store (125 bytes) inUse next data 1 5 NeoStore (5 bytes) inUse datum 1 5 Monday, May 21, 2012
  • 13. Outline Traversals Core API Cypher Next: The t wo levels of cache Node/Relationship in Neo4j. Thread local diffs The low level FS Cache for the Object cache record files. And the high level Object cache storing a structure more optimized for traversal. FS Cache HA Record files Transaction log Disk(s) 7 Monday, May 21, 2012
  • 14. The caches ๏ Filesystem cache: • Caches regions store file intofiles sized regions) (divides each of the store equally • The cache holds a fixed number of regions for each file • Regions are evicted based on ahit in non-cached region) (hit count vs. miss count, i.e. LFU-like policy • Default implementation of regions uses OS mmap ๏ Node/Relationship cache • Cache a version more optimized for traversals 8 Monday, May 21, 2012
  • 15. What we put in cache ID Relationship ID refs in: R1 R2 ... Rn The structure of the elements in the high level type 1 object cache. out R1 R2 ... Rn On disk most of the information is contained in: R1 R2 R3 ... Rn in the relationship records, with the nodes just type 2 referencing their first relationship. In the out R1 ... Rn Node cache this is turned around: the nodes hold references to all its relationships. The ... (grouped by type) relationships are simple, only holding its properties. The relationships for each node is grouped by RelationshipType to allow fast traversal of a specific type. Key 1 Key 2 ... Key n All references (dotted arrows) are by ID, and traversals do indirect lookup through the cache. Val 1 Val 2 ID start end type Val n Relationship Key 1 Key 2 ... Key n Val 1 Val 2 Val n 9 Monday, May 21, 2012
  • 16. Outline So how do traversals work... Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 10 Monday, May 21, 2012
  • 17. Traversals - how do they work? ๏ RelationshipExpanders: given (a path to) a node, returns The surface layer, the you Relationships to continue traversing from that node interact with. ๏ Evaluators: given (a path to) a node, returns whether to: • Continue traversing on that branch (i.e. expand) or not • Include (the path to) the node in the result set or not ๏ Then a projection to Path, Node, or Relationship applied to each Path in the result set. ... but also: ๏ Uniqueness level: policy for when it is ok to revisit a node that has already been visited ๏ Implemented on top of the Core API 11 Monday, May 21, 2012
  • 18. More on Traversals ๏ Fetch node data from cache - non-blocking access This is what happens • under the hood. If not in cache, retrieve from storage, into cache ‣If region is in FS cache: blocking but short duration access ‣If region is outside FS cache: blocking slower access ๏ Get relationships from cached node • If not fetched, retrieve from storage, by following chains ๏ Expand relationship(s) to end up on next node(s) • The relationship knows the node, no need to fetch it yet ๏ Evaluate • possibly emitting a Path into the result set ๏ Repeat 12 Monday, May 21, 2012
  • 19. Outline How is Cypher different? Traversals Core API Cypher and how dowes it work? Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 13 Monday, May 21, 2012
  • 20. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 21. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 22. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 23. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 24. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 25. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 26. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 27. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 28. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 29. What about gremlin? ๏ gremlin is a third party language, built by Marko Rodriguez of Tinkerpop (a group of people who like to hack on graphs) ๏ Originally based on the idea of using xpath to describe traversals: ./HAS_CART/CONTAINS_ITEM/PURCHASED/PURCHASED but bastardized to distinguish between nodes and relationships: ./outE[label=HAS_CART]/inV Traversals are close /outE[label=CONTAINS_ITEM]/inV to xpath, which is why xpath like /inE[label=PURCHASED]/outV descriptions of traversals seemed /outE[label=PURCHASED]/inV like a good idea. ๏ xpath is not complete enough to express full algorithms, it needs a host language, gremlin originally defined its own. This changed Groovy as a more complete host language and abandoned xpath in favor of method chaining [ replace ‘/’ with ‘.’ ] 15 Monday, May 21, 2012
  • 30. Gremlin compared to Cypher ๏start me=node:people(name={myname}) match me-[:HAS_CART]->cart-[:CONTAINS_ITEM]->item item<-[:PURCHASED]-user-[:PURCHASED]->recommendation return recommendation ๏ Cypher is declarative, describes what data to get - its shape ๏ Gremlin is imperative, prescribes how to get the data ๏ Cypher has more opportunities for optimization by the engine ๏ Gremlin can implement pagerank, Cypher can’t (yet?) 16 Monday, May 21, 2012
  • 31. Outline Traversals Core API Cypher Transactions involve t wo parts: The (thread local) changes being done by an active transaction, Node/Relationship and the transaction replay log Thread local diffs Object cache for recovery. FS Cache HA Record files Transaction log Disk(s) 17 Monday, May 21, 2012
  • 32. Transaction Isolation ๏ Mutating operations are not written when performed ๏ They are stored in a thread confined transaction state object ๏ This prevents other threads from seeing uncommitted changes from the transactions of other threads ๏ When Transaction.finish() is invoked the transaction is either committed or rolled back ๏ Rollback is simple: discard the transaction state object 18 Monday, May 21, 2012
  • 33. Transactions & Durability ๏ Commit is: • Changes made in the transaction are collected as commands • Commands are sorted to get predictable update order ‣This prevents concurrent readers from seeing inconsistent data when the changes are applied to the store • Write changes (in sorted order) to the transaction log • Mark the transaction as committed in the log • Apply the changes (in sorted order) to the store files 19 Monday, May 21, 2012
  • 34. Recovery ๏ Transaction commands dictate state, they don’t modify state • i.e. SET property "count" to 5 • rather than ADD 1 to property "count" ๏ Thus: Applying the same command twice yields the same state ๏ Recovery simply replays all transactions since the last safe point ๏ If tx A mutates node1.name, then tx B also mutates node1.name that doesn’t matter, because the database is not recovered until all transactions have been replayed 20 Monday, May 21, 2012
  • 35. Outline Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log High Availability in Neo4j builds on top of the transaction replay Disk(s) 21 Monday, May 21, 2012
  • 36. Outline The transaction logs are shared bet ween all instances in an High Availability setup, all other parts operate on the local data just like in the standalone case. Traversals Core API Cypher Local Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Shared Disk(s) 22 Monday, May 21, 2012
  • 37. • HA - the parts to it: ๏ Based on streaming transactions between servers ๏ All transactions are committed through the master • Then (eventually) applied to the slaves • Eventuality synchronizationupdate intervalby interaction or when defined by the is mandated ๏ When writing to a slave: • Locks coordinated through the master • Transaction data buffered on theget a txid applied first on the master to slave then applied with the same txid on the slave 23 Monday, May 21, 2012
  • 38. Creating new Nodes and Relationships ๏ New Nodes/Relationships don’t need locks, so they don’t need a transaction synced with master until the transaction commit ๏ They do need an ID that is unique and equal among all instances ๏ Each instance allocates IDs in blocks from the master, then assigns them to new Nodes/Relationships locally • This batch allocation can be seen in (Enterprise) 1000 as Node/Relationship counts jumping in steps of WebAdmin 24 Monday, May 21, 2012
  • 39. HA synchronization points ๏ Transactions are uniquely identified by monotonically increasing ID ๏ All Requests from slave send the current latest txid on that slave ๏ Responses from master send back a stream of transactions that have occurred since then, along with the actual result ๏ Transaction is replayed just like when committed / recovered ๏ Nodes/Relationships touched during this application are evicted from cache to avoid cache staleness ๏ Transaction commands are only sorted when created, before stored/transmitted, thus consistency is preserved during all application phases 25 Monday, May 21, 2012
  • 40. Locking semantics of HA ๏ To be granted a lock the slave must have the latest version of the Node/Relationship it is trying to lock • This ensures consistency • The implementation of “Latest version ofentireNode/ Relationship” is “Latest version of the the graph” • The slave must thus sync transactions from the master 26 Monday, May 21, 2012
  • 41. Master election ๏ Each instance communicates/coordinates: • its latest transaction id (including the master id for that tx) • the id forclock value for when the txid was written that instance • (logical) ๏ Election chooses: 1. The instance with highest txid 2. IF multiple: The instance that was master for that tx 3. IF unavailable: The instance with the lowest clock value 4. IF multiple: The instance with the lowest id ๏ Election happens when the current master cannot be reached • Any instance can choose to re-elect • Each instance runs the election protocol individually • Notify others when election chooses new master ๏ When elected, the new master broadcasts to all instances, 27 forcing them to bind to the new master Monday, May 21, 2012
  • 42. Thank you for listening! tobias@neotechnology.com Tobias Lindaaker twitter: @thobe, #neo4j (@neo4j) web: neo4j.org neotechnology.com Hacker @ Neo Technology my web: thobe.org Monday, May 21, 2012