Graph Query Language Cypher Explained

Cypher Query Language
Chicago Graph Database Meet-Up
Max De Marzi

What is Cypher?

• Graph Query Language for Neo4j
• Aims to make querying simple

Why Cypher?

• Existing Neo4j query mechanisms were not
simple enough

• Too verbose (Java API)
• Too prescriptive (Gremlin)

SQL?

• Unable to express paths
• these are crucial for graph-based
reasoning

• Neo4j is schema/table free

SPARQL?

• SPARQL designed for a different data
model

• namespaces
• properties as nodes
• high learning curve

Design Decisions

Declarative
Most of the time, Neo4j knows better than you

Imperative Declarative
follow relationship specify starting point
breadth-first vs depth-first specify desired outcome

explicit algorithm algorithm adaptable
based on query

Design Decisions
Pattern matching

Design Decisions
Pattern matching

A

B C

Design Decisions
ASCII-art patterns

() --> ()

Design Decisions
Directed relationship

A B

(A) --> (B)

Design Decisions
Undirected relationship

A B

(A) -- (B)

Design Decisions
specific relationships

LOVES
A B

A -[:LOVES]-> B

Design Decisions
Joined paths

A B C

A --> B --> C

Design Decisions
multiple paths

A

B C

A --> B --> C, A --> C
A --> B --> C <-- A

Design Decisions
Variable length paths
A B

A B

A B
...
A -[*]-> B

Design Decisions
Optional relationships

A B

A -[?]-> B

Design Decisions
Familiar for SQL users

select
start
from
match
where
where
group by
return
order by

START
SELECT *
FROM Person
WHERE firstName = “Max”

START max=node:persons(firstName = “Max”)
RETURN max

MATCH
SELECT skills.*
FROM users
JOIN skills ON users.id = skills.user_id
WHERE users.id = 101

START user = node(101)
MATCH user --> skills
RETURN skills

Optional MATCH
SELECT skills.*
FROM users
LEFT JOIN skills ON users.id = skills.user_id
WHERE users.id = 101

MATCH user –[?]-> skills
RETURN skills

SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id WHERE
users.id = 1

MATCH user -[user_skill]-> skill
RETURN skill, user_skill

Indexes

Used as multiple starting points, not to speed
up any traversals

START a = node:nodes_index(type='User') MATCH
a-[r:knows]-b
RETURN ID(a), ID(b), r.weight

http://maxdemarzi.com/2012/03/16/jung-in-neo4j-par

Complicated Match

Some UGLY recursive self join on the groups
table

START max=node:person(name=“Max")
MATCH group <-[:BELONGS_TO*]- max
RETURN group

Where
SELECT person.*
FROM person
WHERE person.age >32
OR person.hair = "bald"

START person = node:persons("name:*") WHERE
person.age >32
OR person.hair = "bald"
RETURN person

Return
SELECT person.name, count(*)
FROM Person
GROUP BY person.name
ORDER BY person.name

START person=node:persons("name:*") RETURN
person.name, count(*)
ORDER BY person.name

Order By, Parameters
Same as SQL

{node_id} expected as part of request

START me = node({node_id})
MATCH (me)-[?:follows]->(friends)-[?:follows]->(fof)-[?:follows]->(fofof)-
[?:follows]->others
RETURN me.name, friends.name, fof.name, fofof.name, count(others)
ORDER BY friends.name, fof.name, fofof.name, count(others) DESC

http://maxdemarzi.com/2012/02/13/visualizing-a-netw

Graph Functions

Some UGLY multiple recursive self and inner joins on
the user and all related tables

START lucy=node(1000), kevin=node(759) MATCH p
= shortestPath( lucy-[*]-kevin ) RETURN p

Aggregate Functions
ID: get the neo4j assigned identifier
Count: add up the number of occurrences
Min: get the lowest value
Max: get the highest value
Avg: get the average of a numeric value
Distinct: remove duplicates

START me = node:nodes_index(type = 'user')
MATCH (me)-[r?:wrote]-()
RETURN ID(me), me.name, count(r), min(r.date), max(r.date)" ORDER
BY ID(me)

Functions

Collect: put all values in a list

START a = node:nodes_index(type='User')
MATCH a-[:follows]->b
RETURN a.name, collect(b.name)

http://maxdemarzi.com/2012/02/02/graph-visualizatio

Combine Functions

Collect the ID of friends

START me = node:nodes_index(type = 'user')"
MATCH (me)<-[r?:wrote]-(friends)
RETURN ID(me), me.name, collect(ID(friends)), collect(r.date)
ORDER BY ID(me)

http://maxdemarzi.com/2012/03/08/connections-in-time/

Uses
Recommend Friends

START me = node({node_id})
MATCH (me)-[:friends]->(friend)-[:friends]->(foaf)
RETURN foaf.name

Uses
Six Degrees of Kevin Bacon

Length: counts the number of nodes along a path
Extract: gets the nodes/relationships from a path

START me=node({start_node_id}),
them=node({destination_node_id})
MATCH path = allShortestPaths( me-[?*]->them )
RETURN length(path),
extract(person in nodes(path) : person.name)

Uses
Similar Users

Users who rated same items within 2 points.

Abs: gets absolute numeric value

START me = node(user1)
MATCH (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u)
WHERE abs(myRating.rating-otherRating.rating)<=2
RETURN u

Boolean Operations
Items with a rating > 7 that similar users rated, but I have not
And: this and that are true
Or: this or that is true
Not: this is false

START me=node(user1),
similarUsers=node(3) (result received in the first query)
MATCH (similarUsers)-[r:RATED]->(item)
WHERE r.rating > 7 AND NOT((me)-[:RATED]->(item))
RETURN item

http://thought-bytes.blogspot.com/2012/02/similarity-based-recommendation

Predicates
ALL: closure is true for all items
ANY: closure is true for any item
NONE: closure is true for no items
SINGLE: closure is true for exactly 1 item

START london = node(1), moscow = node(2)
MATCH path = london -[*]-> moscow
WHERE all(city in nodes(path) where
city.capital = true)

Design Decisions
Parsed, not an internal DSL

Execution Semantics Serialisation

Type System Portability

Design Decisions
Database vs Application
Design Goal: single user
interaction expressible as
single query

Queries have enough logic to
find required data, not enough
to process it

Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b

Implementation

Execution Plan

start n=node(0) Cypher is Pipes
return n
lazily evaluated
Parameters() pulling from pipes underneath
Nodes(n)
Extract([n])
ColumnFilter([n])

Implementation

Execution Plan
start n=node(0)
match n-[*]-> b
return n.name, n, count(*)
order by n.age

Parameters()
Nodes(n)
PatternMatch(n-[*]->b)
Extract([n.name, n])
EagerAggregation( keys: [n.name, n], aggregates: [count(*)])
Extract([n.age])
Sort(n.age ASC)
ColumnFilter([n.name,n,count(*)])

Implementation

Execution Plan
start n=node(0)
match n-[*]-> b
return n.name, n, count(*)
order by n.name

Parameters()
Nodes(n)
PatternMatch(n-[*]->b)
Extract([n.name, n])
Sort(n.name ASC,n ASC)
EagerAgregation( keys: [n.name, n], aggregates: [count(*)])
ColumnFilter([n.name,n,count(*)])

Thanks for Listening!
Questions?

maxdemarzi.com

Graph Query Language Cypher Explained

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Graph Query Language Cypher Explained

Similaire à Graph Query Language Cypher Explained (20)

Plus de Max De Marzi

Plus de Max De Marzi (20)

Dernier

Dernier (20)

Graph Query Language Cypher Explained

Notes de l'éditeur