Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Loading in …3
×
1 of 53

Cypher

20

Share

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Cypher

  1. 1. Cypher Query Language Chicago Graph Database Meet-Up Max De Marzi
  2. 2. What is Cypher? • Graph Query Language for Neo4j • Aims to make querying simple
  3. 3. Why Cypher? • Existing Neo4j query mechanisms were not simple enough • Too verbose (Java API) • Too prescriptive (Gremlin)
  4. 4. SQL? • Unable to express paths • these are crucial for graph-based reasoning • Neo4j is schema/table free
  5. 5. SPARQL? • SPARQL designed for a different data model • namespaces • properties as nodes • high learning curve
  6. 6. Design
  7. 7. Design Decisions Declarative Most of the time, Neo4j knows better than you Imperative Declarative follow relationship specify starting point breadth-first vs depth-first specify desired outcome explicit algorithm algorithm adaptable based on query
  8. 8. Design Decisions Pattern matching
  9. 9. Design Decisions Pattern matching A B C
  10. 10. Design Decisions Pattern matching
  11. 11. Design Decisions Pattern matching
  12. 12. Design Decisions Pattern matching
  13. 13. Design Decisions Pattern matching
  14. 14. Design Decisions ASCII-art patterns () --> ()
  15. 15. Design Decisions Directed relationship A B (A) --> (B)
  16. 16. Design Decisions Undirected relationship A B (A) -- (B)
  17. 17. Design Decisions specific relationships LOVES A B A -[:LOVES]-> B
  18. 18. Design Decisions Joined paths A B C A --> B --> C
  19. 19. Design Decisions multiple paths A B C A --> B --> C, A --> C A --> B --> C <-- A
  20. 20. Design Decisions Variable length paths A B A B A B ... A -[*]-> B
  21. 21. Design Decisions Optional relationships A B A -[?]-> B
  22. 22. Design Decisions Familiar for SQL users select start from match where where group by return order by
  23. 23. START SELECT * FROM Person WHERE firstName = “Max” START max=node:persons(firstName = “Max”) RETURN max
  24. 24. MATCH SELECT skills.* FROM users JOIN skills ON users.id = skills.user_id WHERE users.id = 101 START user = node(101) MATCH user --> skills RETURN skills
  25. 25. Optional MATCH SELECT skills.* FROM users LEFT JOIN skills ON users.id = skills.user_id WHERE users.id = 101 START user = node(101) MATCH user –[?]-> skills RETURN skills
  26. 26. SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
  27. 27. START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill
  28. 28. Indexes Used as multiple starting points, not to speed up any traversals START a = node:nodes_index(type='User') MATCH a-[r:knows]-b RETURN ID(a), ID(b), r.weight
  29. 29. http://maxdemarzi.com/2012/03/16/jung-in-neo4j-par
  30. 30. Complicated Match Some UGLY recursive self join on the groups table START max=node:person(name=“Max") MATCH group <-[:BELONGS_TO*]- max RETURN group
  31. 31. Where SELECT person.* FROM person WHERE person.age >32 OR person.hair = "bald" START person = node:persons("name:*") WHERE person.age >32 OR person.hair = "bald" RETURN person
  32. 32. Return SELECT person.name, count(*) FROM Person GROUP BY person.name ORDER BY person.name START person=node:persons("name:*") RETURN person.name, count(*) ORDER BY person.name
  33. 33. Order By, Parameters Same as SQL {node_id} expected as part of request START me = node({node_id}) MATCH (me)-[?:follows]->(friends)-[?:follows]->(fof)-[?:follows]->(fofof)- [?:follows]->others RETURN me.name, friends.name, fof.name, fofof.name, count(others) ORDER BY friends.name, fof.name, fofof.name, count(others) DESC
  34. 34. http://maxdemarzi.com/2012/02/13/visualizing-a-netw
  35. 35. Graph Functions Some UGLY multiple recursive self and inner joins on the user and all related tables START lucy=node(1000), kevin=node(759) MATCH p = shortestPath( lucy-[*]-kevin ) RETURN p
  36. 36. Aggregate Functions ID: get the neo4j assigned identifier Count: add up the number of occurrences Min: get the lowest value Max: get the highest value Avg: get the average of a numeric value Distinct: remove duplicates START me = node:nodes_index(type = 'user') MATCH (me)-[r?:wrote]-() RETURN ID(me), me.name, count(r), min(r.date), max(r.date)" ORDER BY ID(me)
  37. 37. Functions Collect: put all values in a list START a = node:nodes_index(type='User') MATCH a-[:follows]->b RETURN a.name, collect(b.name)
  38. 38. http://maxdemarzi.com/2012/02/02/graph-visualizatio
  39. 39. Combine Functions Collect the ID of friends START me = node:nodes_index(type = 'user')" MATCH (me)<-[r?:wrote]-(friends) RETURN ID(me), me.name, collect(ID(friends)), collect(r.date) ORDER BY ID(me)
  40. 40. http://maxdemarzi.com/2012/03/08/connections-in-time/
  41. 41. Uses Recommend Friends START me = node({node_id}) MATCH (me)-[:friends]->(friend)-[:friends]->(foaf) RETURN foaf.name
  42. 42. Uses Six Degrees of Kevin Bacon Length: counts the number of nodes along a path Extract: gets the nodes/relationships from a path START me=node({start_node_id}), them=node({destination_node_id}) MATCH path = allShortestPaths( me-[?*]->them ) RETURN length(path), extract(person in nodes(path) : person.name)
  43. 43. Uses Similar Users Users who rated same items within 2 points. Abs: gets absolute numeric value START me = node(user1) MATCH (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u) WHERE abs(myRating.rating-otherRating.rating)<=2 RETURN u
  44. 44. Boolean Operations Items with a rating > 7 that similar users rated, but I have not And: this and that are true Or: this or that is true Not: this is false START me=node(user1),         similarUsers=node(3) (result received in the first query) MATCH (similarUsers)-[r:RATED]->(item) WHERE r.rating > 7 AND NOT((me)-[:RATED]->(item))  RETURN item http://thought-bytes.blogspot.com/2012/02/similarity-based-recommendation
  45. 45. Predicates ALL: closure is true for all items ANY: closure is true for any item NONE: closure is true for no items SINGLE: closure is true for exactly 1 item START london = node(1), moscow = node(2) MATCH path = london -[*]-> moscow WHERE all(city in nodes(path) where city.capital = true)
  46. 46. Design Decisions Parsed, not an internal DSL Execution Semantics Serialisation Type System Portability
  47. 47. Design Decisions Database vs Application Design Goal: single user interaction expressible as single query Queries have enough logic to find required data, not enough to process it
  48. 48. Implementation
  49. 49. Implementation • Recursive matching with backtracking START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
  50. 50. Implementation Execution Plan start n=node(0) Cypher is Pipes return n lazily evaluated Parameters() pulling from pipes underneath Nodes(n) Extract([n]) ColumnFilter([n])
  51. 51. Implementation Execution Plan start n=node(0) match n-[*]-> b return n.name, n, count(*) order by n.age Parameters() Nodes(n) PatternMatch(n-[*]->b) Extract([n.name, n]) EagerAggregation( keys: [n.name, n], aggregates: [count(*)]) Extract([n.age]) Sort(n.age ASC) ColumnFilter([n.name,n,count(*)])
  52. 52. Implementation Execution Plan start n=node(0) match n-[*]-> b return n.name, n, count(*) order by n.name Parameters() Nodes(n) PatternMatch(n-[*]->b) Extract([n.name, n]) Sort(n.name ASC,n ASC) EagerAgregation( keys: [n.name, n], aggregates: [count(*)]) ColumnFilter([n.name,n,count(*)])
  53. 53. Thanks for Listening! Questions? maxdemarzi.com

Editor's Notes

  • There existed a number of different ways to query a graph database. This one aims to make querying easy, and to produce queries that are readable. We looked at alternatives - SPARQL, SQL, Gremlin and other...
  • ×