SlideShare une entreprise Scribd logo
1  sur  140
Télécharger pour lire hors ligne
The Gremlin Graph Traversal
Machine and Language
Dr. Marko A. Rodriguez
Director of Engineering at DataStax, Inc.
Project Management Committee, Apache TinkerPop
http://tinkerpop.incubator.apache.org
Database Programming Languages
Rodriguez, M.A., “The Gremlin Graph Traversal Machine and Language,” Proceedings of the ACM Database Programming
Languages Conference, doi:10.1145/2815072.2815073, ACM, Pittsburg, Pennsylvania, October 2015.
Java is a virtual machine.
Java is a programming language.
Java is a virtual machine.
Java is a programming language.
Gremlin is a traversal machine.
Gremlin is a traversal language.
Java is a virtual machine.
Java is a programming language.
Gremlin is a traversal machine.
Gremlin is a traversal language.
Java is operating system agnostic.
MacOSX, Linux, Windows, etc.
Java is a virtual machine.
Java is a programming language.
Gremlin is a traversal machine.
Gremlin is a traversal language.
Java is operating system agnostic.
MacOSX, Linux, Windows, etc.
Gremlin is graph system agnostic.
Titan, Neo4j, Stardog, Giraph, Spark, Hadoop, etc.
Java is a virtual machine.
Java is a programming language.
Gremlin is a traversal machine.
Gremlin is a traversal language.
Other languages compile to the Java virtual machine.
Groovy, Scala, Clojure, JavaScript Nashorn, etc.
Java is operating system agnostic.
MacOSX, Linux, Windows, etc.
Gremlin is graph system agnostic.
Titan, Neo4j, Stardog, Giraph, Spark, Hadoop, etc.
Java is a virtual machine.
Java is a programming language.
Gremlin is a traversal machine.
Gremlin is a traversal language.
Other languages compile to the Java virtual machine.
Groovy, Scala, Clojure, JavaScript Nashorn, etc.
Other languages compile to the Gremlin traversal machine.
Gremlin-Groovy, Gremlin-Scala, SPARQL, etc.
Java is operating system agnostic.
MacOSX, Linux, Windows, etc.
Gremlin is graph system agnostic.
Titan, Neo4j, Stardog, Giraph, Spark, Hadoop, etc.
Rodriguez, M.A., Kuppitz, D., “The Benefits of the Gremlin Graph Traversal Machine," DataStax Engineering Blog, 2015.
http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
TinkerPop is an open source graph computing framework.
TinkerPop was started in 2009 by Marko A. Rodriguez, Josh Shinavier, and Peter Neubauer.
TinkerPop joined the Apache Software Foundation January 2015 (currently in incubation).
TinkerPop designs and develops the Gremlin graph traversal machine and language.
TinkerPop is supported by nearly every graph system provider
(both commercial and open source).
TinkerPop3 has been in development for 2 years and was officially released July 2015.
"What is The TinkerPop?"
"What is The TinkerPop?"
Graph
Database
Graph
Processor
Provider API Provider API
Provider
Strategies
Gremlin Traversal Language
Graph
Computer
Core API
(graph, vertex, edge)
Monitoring
Gremlin Server
JSON
Kryo
...
ganglia
cvs
graphite
jmx
{
User
DSL
OLTP OLAP
TinkerPop provides graph computing capabilities to any data processing system.
TinkerPop supports both OLTP (database) and OLAP (batch processor) systems.
TinkerPop is used to process the order fulfillment network of Amazon.com
("100 billion vertices" -- at least factor 10x on edges?).
TinkerPop is the sole interface for >50% of all graph systems in the market/wild.
TinkerPop currently processes a near trillion edge graph represented over 1000 machines.
TinkerPop works with Apache Cassandra, Apache HBase, Apache Hadoop, Apache Spark,
Apache Giraph, Apache Atlas, Apache Falcon, ...
TinkerPop is always looking for more contributors and perhaps TinkerPop could be your
future software/language design and development home.
"Why should you care about The TinkerPop?"
https://videos.dabcc.com/aws-reinvent-2015-dat203-building-graph-databases-on-aws/
(can't disclose company name)
TinkerPop provides graph computing capabilities to any data processing system.
TinkerPop supports both OLTP (database) and OLAP (batch processor) systems.
TinkerPop is used to process the order fulfillment network of Amazon.com
("100 billion vertices" -- at least factor 10x on edges?).
TinkerPop is the sole interface for >50% of all graph systems in the market/wild.
TinkerPop currently processes a near trillion edge graph represented over 1000 machines.
TinkerPop works with Apache Cassandra, Apache HBase, Apache Hadoop, Apache Spark,
Apache Giraph, Apache Atlas, Apache Falcon, ...
TinkerPop is always looking for more contributors and perhaps TinkerPop could be your
future software/language design and development home.
"Why should you care about The TinkerPop?"
https://videos.dabcc.com/aws-reinvent-2015-dat203-building-graph-databases-on-aws/
TinkerPop has language bindings in every major programming language.
TinkerPop provides graph computing capabilities to any data processing system.
TinkerPop supports both OLTP (database) and OLAP (batch processor) systems.
TinkerPop is used to process the order fulfillment network of Amazon.com
("100 billion vertices" -- at least factor 10x on edges?).
TinkerPop is the sole interface for >50% of all graph systems in the market/wild.
TinkerPop currently processes a near trillion edge graph represented over 1000 machines.
TinkerPop works with Apache Cassandra, Apache HBase, Apache Hadoop, Apache Spark,
Apache Giraph, Apache Atlas, Apache Falcon, ...
TinkerPop is always looking for more contributors and perhaps TinkerPop could be your
future software/language design and development home.
"Why should you care about The TinkerPop?"
https://videos.dabcc.com/aws-reinvent-2015-dat203-building-graph-databases-on-aws/
TinkerPop has language bindings in every major programming language.
TinkerPop
isalso
used
forsm
allin-m
em
orygraphs.
Part 1: The Gremlin Traversal Machine
The Machine Components
The Graph
The Traverser
The Traversal
The Data
The CPU
The Program
The Graph
vertex
"The graph is the 'RAM'."
The Graph
1
vertex id
"Vertices and edges in the graph have unique ids (addresses)."
The Graph
1
vertex label
person
The Graph
1
vertex properties
person
name:marko
age:36
The Graph
1
person
name:marko
age:36
edge
The Graph
1
person
name:marko
age:36
edge direction
The Graph
1
person
name:marko
age:36
edge id
2
The Graph
1
person
name:marko
age:36
edge label
2 knows
The Graph
1
person
name:marko
age:36
edge properties
2 knows
since:2013
weight:0.9
The Graph
1
person
name:marko
age:36
outE
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
inVoutV
"Vertices are related by edges (addresses have pointers to each other)."
The Graph
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
Directed, binary, attributed multi-graph.
The Graph
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
Directed, binary, attributed multi-graph.
G = (V, E ⊆ (V × V ), λ : (V ∪ E) × Σ∗
→ U  (V ∪ E))
vertices
edges directed, binary
labels+properties
function
properties can't reference
vertices or edges
property key
(string)
vertices or edges
U = anything
The Graph
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
Directed, binary, attributed multi-graph.
G = (V, E ⊆ (V × V ), λ : (V ∪ E) × Σ∗
→ U  (V ∪ E))
λ(v1, name) → marko
λ(e2, label) → knows
((e2)1, (e2)2) = (v1, v3)
The Graph
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
Directed, binary, attributed multi-graph.
Property graph.
* Multi- and meta-properties are not discussed in this presentation nor in the associated conference article.
http://tinkerpop.incubator.apache.org/docs/3.0.2-incubating/#vertex-properties
~/tinkerpop3$ bin/gremlin.sh
gremlin>
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin>
"TinkerGraph is an in-memory graph system provided by TinkerPop."
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin>
TitanGraph.open(…)
Neo4jGraph.open(…)
OrientGraph.open(…)
HadoopGraph.open(…)
HadoopGraph.
compute(GiraphGraphComputer)
HadoopGraph.
compute(SparkGraphComputer)
"Gremlin is agnostic to the underlying graph system."
StardogGraph.open(…)
etc...
DSEGraph.open(…)
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> v1 = graph.addVertex(id,1,label,'person')
==>v[1]
gremlin>
"A graph system provider is Gremlin-enabled if they implement the gremlin.structure API suite."
1
person
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> v1 = graph.addVertex(id,1,label,'person')
==>v[1]
gremlin> v1.property('name','marko')
==>vp[name->marko]
gremlin>
"The APIs have to do with CRUD operations on a graph structure."
1
person
name:marko
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> v1 = graph.addVertex(id,1,label,'person')
==>v[1]
gremlin> v1.property('name','marko')
==>vp[name->marko]
gremlin> v1.property('age',36)
==>vp[age->36]
gremlin>
1
person
name:marko
age:36
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> v1 = graph.addVertex(id,1,label,'person')
==>v[1]
gremlin> v1.property('name','marko')
==>vp[name->marko]
gremlin> v1.property('age',36)
==>vp[age->36]
gremlin> v3 = graph.addVertex(id,3,label,'person','name','kuppitz','age',33)
==>v[3]
gremlin>
3
person
name:kuppitz
age:33
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> v1 = graph.addVertex(id,1,label,'person')
==>v[1]
gremlin> v1.property('name','marko')
==>vp[name->marko]
gremlin> v1.property('age',36)
==>vp[age->36]
gremlin> v3 = graph.addVertex(id,3,label,'person','name','kuppitz','age',33)
==>v[3]
gremlin> v1.addEdge('knows',v3,id,2,'since',2013,'weight',0.9)
==>e[2][1-knows->3]
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
The Machine Components
The Graph
The Traverser
The Traversal
The Data
The CPU
The Program
The Traverser
"The traverser is a 'CPU core'."
The Traverser
t ∈ T
"The traverser t is in a set of traversers T."
The Traverser
t ∈ T
3
"The traverser's current graph location is vertex 3."
µ(t) → v3
The Traverser
t ∈ T
3
"The traverser may remember his history in the graph."
1 3
2
µ(t) → v3
∆(t) → ((∅, v1), (∅, e2), (∅, v3))
The Traverser
t ∈ T
3
"The traverser records how many times he has been through a loop sequence."
1 3
2
µ(t) → v3
∆(t) → ((∅, v1), (∅, e2), (∅, v3))
ι(t) → 0
The Traverser
t ∈ T
3
"The traverser can represent more than one traverser."
1 3
2
µ(t) → v3
∆(t) → ((∅, v1), (∅, e2), (∅, v3))
ι(t) → 0
It's
what you
would
do
if you
thought in
term
s
of "energy
flows."
β(t) → 1
The Traverser
t ∈ T
3
"The traverser has a reference to his location in the traversal (program counter)."
1 3
2
µ(t) → v3
∆(t) → ((∅, v1), (∅, e2), (∅, v3))
ι(t) → 0
β(t) → 1
outE('knows') inV() values('name')g.V(1)
ψ(t) → values(‘name’)
T ⊆ U × Ψ × (P(Σ∗
) × U)∗
× N+
× U × N+
The Traverser
t ∈ T
3
"Gremlin sack" (not discussed here)
1 3
2
µ(t) → v3
∆(t) → ((∅, v1), (∅, e2), (∅, v3))
ι(t) → 0
β(t) → 1
outE('knows') inV() values('name')g.V(1)
ψ(t) → values(‘name’)
The Traverser
"Numerous (many many many) traversers are spawned during a traversal (computation)."
many
many
Com
binatoric
explosion
The Traverser
"A 21x21 vertex lattice -- 20x20 edges."
The Traverser
"A Gremlin starts at (0,0) and can only go right and down to reach (20,20).
How many traversers (paths) ultimately reach (20,20)?"
The Traverser
"For the sake of demonstration, lets look at a 5x5 lattice (4x4 edges)."
The Traverser
1
"Total = 1"
The Traverser
1
1
"Total = 2"
The Traverser
"Total = 4"
1
1
2
The Traverser
"Total = 8"
1
1
3
3
The Traverser
"Total = 16"
1
1
4
4
6
The Traverser
"Total = 20"
5
5
10
10
The Traverser
"Total = 50"
15
15
20
The Traverser
"Total = 70"
35
35
The Traverser
"Total = 70"
70
The Traverser
70
✓
2n
n
◆
=
(2n)!
(n!)2
=
(2n)(2n 1)(2n 2) . . . 1
(n(n 1)(n 2) . . . 1)2
8
choose
4
=
70
n = number of edges on a side
"Analytically, ~137 billion traversers will exist at (20,20)."
The Traverser
✓
2n
n
◆
=
(2n)!
(n!)2
=
(2n)(2n 1)(2n 2) . . . 1
(n(n 1)(n 2) . . . 1)2
40
choose
20
The Traverser
gremlin> g.V(0).repeat(out('down','right')).times(40).count()
==>137846528820
gremlin>
"That was fast… The Gremlin machine spawned 137 billion traversers?!"
"No. At vertex 400 (20,20), there exists 1 traverser with a bulk of ~137 billion."
The Traverser
gremlin> g.V(0).repeat(out('down','right')).times(40).count()
==>137846528820
gremlin> clock(100){g.V(0).repeat(out('down','right')).times(40).count().next()}
==>0.8890942
less than a millisecond
β(t) → 137846528820
Bulking is a crucial component of OLAP when near
trillion edge graphs are moving traversers between machines.
The Traverser
"If 2 traversers (each with bulk 1) are at the same location in the graph and have similar other properties,
make 1 traverser with a bulk of 2."
Traverser Equivalence Class
graph location
traversal location
path history
same loop counter
[t] = {t ∈ T | µ(t) = µ(t ) ∧
ψ(t) = ψ(t ) ∧
∆(t) = ∆(t ) ∧
ι(t) = ι(t ) }.
data location
program counter
MAY NOT BE REQUIRED
recursion of program counter
β(t1) → 1
β(t2) → 3
β(t3) → 4
β(t4) → 8
Don't enumerate, bulk!
The Traverser
User Machine
"In Gremlin OLAP, the graph is represented across a distributed system (e.g. Hadoop)
as a partitioned adjacency list."
GTM GTM
GTM GTM
Machine C
Machine A Machine B
Machine D
Cluster
The Traverser
User Machine
x.y.z
"The user creates a traversal (compiled from any language)."
GTM GTM
GTM GTM
Machine C
Machine A Machine B
Machine D
Cluster
The Traverser
x.y.z x.y.z
x.y.zx.y.z
GTM GTM
GTM GTM
Machine C
Machine A Machine B
Machine D
Cluster
User Machine
x.y.z
"The compiled traversal is sent to every machine in the cluster which has
a piece of the graph and a Gremlin traversal machine."
The Traverser
GTM GTM
GTM GTM
Machine C
Machine A Machine B
Machine D
Cluster
User Machine
x.y.z
"When the graph computation is complete, a reference to the result is returned.
For instance, in HadoopGremlin, HDFS stores both the graph and sideEffect data."
The Traverser
x.y.z
GTM
x.y.z
GTM
Machine A Machine B
"The OLAP algorithm is the classic Bulk Synchronous Parallel algorithm
popularized by Google Pregel and Apache Giraph."
x.y.z
GTM
x.y.z
GTM
The Traverser
Machine A Machine B
"The messages are traversers. Given the traverser equivalence class [t], they can be bulked."
"Its just "complex energy."
The Machine Components
The Graph
The Traverser
The Traversal
The Data
The CPU
The Program
Lim
bo!
The Traversal
Ψ traversal
"The traversal is the 'software program'."
The Traversal
traversalΨ
step-g step-hstep-f
composed of
f
∈
Ψ
g ∈ Ψ
h
∈
Ψ
"The steps are the 'instructions'."
The Traversal
traversalΨ
step-g step-hstep-f
composed of
f
∈
Ψ
defined as
g ∈ Ψ
h
∈
Ψ
f : A∗
→ B∗
"Step f maps a stream (multi-set) of traversers located at objects of type A
to a stream (multi-set) of traversers located at objects of type B."
The Traversal
traversalΨ
step-g step-hstep-f
composed of
f
∈
Ψ
defined as
g ∈ Ψ
h
∈
Ψ
f : A∗
→ B∗
"Step values('name') maps a stream (multi-set) of traversers located at vertices
to a stream (multi-set) of traversers located at strings."
kuppitz
m
allette
frantz
plurad
values('name') :
The Traversal
map : A∗
→ B∗
"For a traverser at a, move the traverser to an object at b."
kuppitzvalues('name') :
The Traversal
map : A∗
→ B∗
"For a traverser at a, move the traverser to an object at b."
flatMap : A∗
→ B∗
"For a traverser at a, clone the traverser across multiple objects in B."
kuppitzvalues('name') :
out('knows') :
The Traversal
map : A∗
→ B∗
"For a traverser at a, move the traverser to an object at b."
flatMap : A∗
→ B∗
"For a traverser at a, clone the traverser across multiple objects in B."
"For a traverser at a, either kill the traverser or leave him alone."
filter : A∗
→ A∗
kuppitzvalues('name') :
out('knows') :
has('age',lt(30)) :
The Traversal
map : A∗
→ B∗
"For a traverser at a, move the traverser to an object at b."
flatMap : A∗
→ B∗
"For a traverser at a, clone the traverser across multiple objects in B."
"For a traverser at a, either kill the traverser or leave him alone."
filter : A∗
→ A∗
"For a traverser at a, leave him alone though manipulate some data structure x."
sideEffect : A∗
→x A∗
kuppitzvalues('name') :
out('knows') :
has('age',lt(30)) :
groupCount('m') : m
v[1]:1
v[2]:34
The Traversal
map : A∗
→ B∗
"For a traverser at a, move the traverser to an object at b."
flatMap : A∗
→ B∗
"For a traverser at a, clone the traverser across multiple objects in B."
"For a traverser at a, either kill the traverser or leave him alone."
filter : A∗
→ A∗
"For a traverser at a, leave him alone though manipulate some data structure x."
sideEffect : A∗
→x A∗
branch : A∗
→b
B∗
"For a traverser at a, choose some internal branch b to ultimately yield traversers at objects of type B."
kuppitzvalues('name') :
out('knows') :
has('age',lt(30)) :
groupCount('m') : m
v[1]:1
v[2]:34
repeat(out('knows')).times(5) :
The Traversal
map flatMap filter sideEffect branch
BranchStep
ChooseStep
LocalStep
RepeatStep
UnionStep
...
AndStep
CoinStep
CyclicPathStep
DedupStep
DropStep
FilterStep
HasStep
IsStep
NotStep
OrStep
RangeStep
...
VertexStep
EdgeVertexStep
CoalesceStep
...
LabelStep
IdStep
PathStep
SackStep
SelectStep
MatchStep
ConstantStep
...
GroupCountStep
GroupStep
StoreStep
AggregateStep
InjectStep
SubgraphStep
TreeStep
IdentityStep
...
"This is the step library (instruction set) of the Gremlin graph traversal machine (virtual machine)."
The Traversal
f ◦ g ◦ h
Linear Motif
g.V(1).out('knows').has('age',lt(30)).values('name')
f g h
repeatout('knows') has('age',lt(30)) values('name')g.V(1)
"The 'byte code' of Gremlin is a sequence of steps .... "
The Traversal
f ◦ g ◦ h
Linear Motif
f(g ◦ h) ◦ k
Nested Motif
g.V(1).out('knows').has('age',lt(30)).values('name')
f g h
g.V(1).repeat(out('knows').has('age',lt(30))).values('name')
f g h k
repeat out('knows') has('age',lt(30)) values('name')g.V(1)
repeatout('knows') has('age',lt(30)) values('name')g.V(1)
"The 'byte code' of Gremlin is a sequence of steps with some steps nested within each other."
gremlin> graph
==>tinkergraph[vertices:2 edges:1]
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
"Back to our super baby toy graph."
gremlin> graph
==>tinkergraph[vertices:2 edges:1]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard]
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
O
LTP
vs. O
LAP
graph
database
vs. processor
Neo4j vs. G
iraph
gremlin> graph
==>tinkergraph[vertices:2 edges:1]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard]
gremlin> g.V(1)
==>v[1]
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
gremlin> graph
==>tinkergraph[vertices:2 edges:1]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard]
gremlin> g.V(1)
==>v[1]
gremlin> g.V(1).label()
==>person
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
gremlin> graph
==>tinkergraph[vertices:2 edges:1]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard]
gremlin> g.V(1)
==>v[1]
gremlin> g.V(1).label()
==>person
gremlin> g.V(1).outE()
==>e[2][1-knows->3]
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
gremlin> graph
==>tinkergraph[vertices:2 edges:1]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard]
gremlin> g.V(1)
==>v[1]
gremlin> g.V(1).label()
==>person
gremlin> g.V(1).outE()
==>e[2][1-knows->3]
gremlin> g.V(1).outE().valueMap()
==>[weight:0.9, since:2013]
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
gremlin> graph
==>tinkergraph[vertices:2 edges:1]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard]
gremlin> g.V(1)
==>v[1]
gremlin> g.V(1).label()
==>person
gremlin> g.V(1).outE()
==>e[2][1-knows->3]
gremlin> g.V(1).outE().valueMap()
==>[weight:0.9, since:2013]
gremlin> g.V(1).out()
==>v[3]
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
gremlin> graph
==>tinkergraph[vertices:2 edges:1]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard]
gremlin> g.V(1)
==>v[1]
gremlin> g.V(1).label()
==>person
gremlin> g.V(1).outE()
==>e[2][1-knows->3]
gremlin> g.V(1).outE().valueMap()
==>[weight:0.9, since:2013]
gremlin> g.V(1).out()
==>v[3]
gremlin> g.V(1).out().values('age')
==>33
gremlin>
1
person
name:marko
age:36
2 knows
since:2013
weight:0.9
3
person
name:kuppitz
age:33
The Machine Components
The Graph
The Traverser
The Traversal
The Data
The CPU
The Program
G ←− µ
t ∈ T
{∆, β, ς, ι}
ψ −→ Ψ
The Machine Components
G ←− µ
t ∈ T
{∆, β, ς, ι}
ψ −→ Ψ
The Machine Components
G ←− µ
t ∈ T
{∆, β, ς, ι}
ψ −→ Ψ
Graph Structure Graph Process+
= Graph Computing
(Data) (Algorithm)
The Machine Components
"Traversers move about a graph as instructed by their traversal. The result of the computation is
1.) the final location of all halted traversers and
2.) the state of any side-effect data structures."
The Machine Components
Part 2: The Gremlin Traversal Language
Ψ
Gremlin's step library is the instruction set of the Gremlin traversal machine.
A linear/nested composition of steps forms a traversal which is executed by the
Gremlin traversal machine.
step1
step2 step3
step4
step1 step3 step4step2
Gremlin-Java8 is the language provided by TinkerPop, though any language
(with respective compiler) can be used to write Gremlin traversals.
g.V(1).as("a").
out("knows").as("b").
select("a","b")
SELECT ?a ?b WHERE {
?a e:knows ?b
?a v:id ?c
FILTER (?c == 1)
}Gremlin-Java8
Written by Daniel Kuppitz
https://github.com/dkuppitz/sparql-gremlin
=
g.V().out('knows')
SELECT ?x ?y WHERE
g.V.out.name
GraphLanguage
Compiler from
Graph Language to
Gremlin Traversal
Gremlin
Traversal Machine
GraphSystem
Database(OLTP)
Processor(OLAP)
GremlinTraversal
Physical Machine
DataProgram Traversal
Heap/DiskMemory Memory
Memory/Graph System
Physical Machine
Java
Virtual Machine
bytecode
steps
DataProgram
Memory/DiskMemory
Physical Machine
instructions
Java
Virtual Machine
Gremlin
Traversal Machine
"The specification is easy … doesn't have to only be on the JVM."
~/tinkerpop3$ bin/gremlin.sh
gremlin>
"Gremlin-Groovy can be executed in the Gremlin Console REPL and thus, is good for demonstrations."
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin>
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')
==>null
gremlin>
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')
==>null
gremlin>
GraphML
Gryo
GraphSON
"TinkerPop supports three I/O graph formats. Though the user can add their own format/parser."
~/tinkerpop3$ bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')
==>null
gremlin>
name:<String>
Artist Song
name:<String>
songType:<String>
performances:<Integer>sungBy
writtenBy followedBy
weight:<Long>
gremlin> :plugin use tinkerpop.gephi
==>tinkerpop.gephi activated
gremlin> :remote connect tinkerpop.gephi
==>Connection to Gephi - http://localhost:8080/workspace0 with stepDelay:1000,
startRGBColor:[0.0, 1.0, 0.5], colorToFade:g, colorFadeRate:0.7
gremlin> :> graph
==>tinkergraph[vertices:808 edges:8049]
gremlin>
http://gephi.org
Generated by Daniel Kuppitz
Rodriguez, M.A., Kuppitz, D., Yim, K., “Tales From the TinkerPop," DataStax Engineering Blog, 2015.
http://www.datastax.com/dev/blog/tales-from-the-tinkerpop
This is an actual Gremlin/R session I performed for this presentation.
I was interested in understanding why Grateful Dead concerts continue to fascinate me.
SPOILER ALERT: No local correlations -- correlations exist in the stationary distribution. EigenDead..the long run."
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]
gremlin>
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]
gremlin> g.V().has('name','DARK STAR')
==>v[89]
gremlin>
Vertex
filter
has(...)
"one-to-[one-or-none]"
Vertex
89
"Get the vertex named Dark Star."
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]
gremlin> g.V().has('name','DARK STAR')
==>v[89]
gremlin>
}
ProviderOptimizationStrategy
* OLTP graph system providers turn this into an index-lookup.
Vertex
filter
has(...)
"one-to-[one-or-none]"
Vertex
89
"TraversalStrategies are an important part of Gremlin (discussed at length in the conference article)."
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]
gremlin> g.V().has('name','DARK STAR')
==>v[89]
gremlin> g.V().has('name','DARK STAR').out('followedBy').values('name')
==>TRUCKING
==>EYES OF THE WORLD
==>HES GONE
==>SING ME BACK HOME
==>SPANISH JAM
==>CHINA DOLL
==>MORNING DEW
==>WHARF RAT
==>THE OTHER ONE
==>MIND LEFT BODY JAM
...
gremlin>
Vertex
filter
has(...)
"one-to-[one-or-none]"
Vertex
89 flatMap
out('followedBy')
"one-to-many"
Vertex
"one-to-one"
map
values('name')
String
TRUCKING
EYES OF THE WORLD
HES GONE
SING ME BACK HOME
SPANISH JAM
...
"What are the names of the songs that have followed Dark Star in concert?"
gremlin> g.V().outE().groupCount().by(label)
==>[followedBy:7047, sungBy:501, writtenBy:501]
gremlin>
Vertex Map<String,Long>
flatMap
[
followedBy:7074
sungBy:501
writtenBy:501
]
outE() {"one-to-many"
Edge
map
groupCount()
"many-to-one"
reducing
barrier
"What is the distribution of edge labels in the graph?"
gremlin> g.V().outE().groupCount().by(label)
==>[followedBy:7047, sungBy:501, writtenBy:501]
gremlin>
Vertex Map<String,Long>
flatMap
[
followedBy:7074
sungBy:501
writtenBy:501
]
outE() {"one-to-many"
Edge
map
groupCount()
"many-to-one"
reducing
barrier
"What is the distribution of edge labels in the graph?"
groupCount()isa
TraversalParent(nestingstep):label()
gremlin> g.V().groupCount().by(outE().count())
==>0=224
==>1=26
==>2=254
==>3=45
==>4=24
==>5=20
==>6=10
==>7=10
==>8=6
==>9=8
...
gremlin>
"What is the out-degree distribution of the graph?"
gremlin> g.V().groupCount().by(outE().count())
==>0=224
==>1=26
==>2=254
==>3=45
==>4=24
==>5=20
==>6=10
==>7=10
==>8=6
==>9=8
...
gremlin> f = new File('grateful-dead-distribution.txt')
==>grateful-dead-distribution.txt
gremlin> g.V().groupCount().by(outE().count()).
next().each{degree,freq -> f.append(degree + 't' + freq + 'n')}
gremlin> :quit
"Save result to a tab delimitated file for further analysis."
$ r
> t <- read.table("grateful-dead-distribution.txt")
>
0 20 40 60 80
050100150200250
Grateful Dead Degree Distribution
out degree
frequency$ r
> t <- read.table("grateful-dead-distribution.txt")
> plot(t,xlab="out degree", ylab="frequency",cex=1.5,cex.lab=1.5,cex.axis=1.5,
main="Grateful Dead Degree Distribution")
>
$ r
> t <- read.table("grateful-dead-distribution.txt")
> plot(t,xlab="out degree", ylab="frequency",cex=1.5,cex.lab=1.5,cex.axis=1.5,
main="Grateful Dead Degree Distribution")
> plot(t,xlab="out degree", ylab="frequency",cex=1.5,cex.lab=1.5,cex.axis=1.5,
main="Grateful Dead Degree Distribution",log="xy")
>
0 20 40 60 80
050100150200250
Grateful Dead Degree Distribution
out degree
frequency
1 2 5 10 20 50 100
125102050100
Grateful Dead Degree Distribution
out degree
frequency
"Yea, yet another power law. Its 2005 and I get a free publication!"
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by('performances')
==>[a:5, b:1]
==>[a:5, b:531]
==>[a:5, b:394]
==>[a:5, b:293]
==>[a:5, b:1]
==>[a:1, b:473]
==>[a:1, b:24]
==>[a:531, b:87]
==>[a:531, b:41]
==>[a:531, b:254]
...
gremlin> f = new File('performance-assortativity.txt')
==>performance-assortativity.txt
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by('performances').
each{f.append(it.a + 't' + it.b + 'n')}
"Are songs that follow each other assortative by the number of times they are played in concert?"
https://en.wikipedia.org/wiki/Assortativity
"Gremlins of a green, bulk together."
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by('performances')
==>[a:5, b:1]
==>[a:5, b:531]
==>[a:5, b:394]
==>[a:5, b:293]
==>[a:5, b:1]
==>[a:1, b:473]
==>[a:1, b:24]
==>[a:531, b:87]
==>[a:531, b:41]
==>[a:531, b:254]
...
gremlin> f = new File('performance-assortativity.txt')
==>performance-assortativity.txt
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by('performances').
each{f.append(it.a + 't' + it.b + 'n')}
> cor.test(t[,1],t[,2], test='pearson')
Pearson's product-moment correlation
data: t[, 1] and t[, 2]
t = -3.9525, df = 7045, p-value = 7.809e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.07030966 -0.02371587
sample estimates:
cor
-0.04703835
No
correlation
betw
een
songs
by
perform
ance.
"…you know: birds and feathers and flocking togethers."
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by('songType')
==>[a:cover, b:cover]
==>[a:cover, b:cover]
==>[a:cover, b:original]
==>[a:cover, b:cover]
==>[a:cover, b:cover]
==>[a:cover, b:original]
==>[a:cover, b:original]
==>[a:cover, b:original]
==>[a:cover, b:cover]
==>[a:cover, b:cover]
...
gremlin> f = new File('songType-assortativity.txt')
==>songType-assortativity.txt
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by('songType').
each{f.append((it.a == 'cover' ? 0 : 1) + 't' +
(it.b == 'cover' ? 0 : 1) + 'n')}
"Are songs that follow each other assortative by either being a cover or an original song?"
"Hmmm.. need numbers -- ah! only two song types."
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by('songType')
==>[a:cover, b:cover]
==>[a:cover, b:cover]
==>[a:cover, b:original]
==>[a:cover, b:cover]
==>[a:cover, b:cover]
==>[a:cover, b:original]
==>[a:cover, b:original]
==>[a:cover, b:original]
==>[a:cover, b:cover]
==>[a:cover, b:cover]
...
gremlin> f = new File('songType-assortativity.txt')
==>songType-assortativity.txt
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by('songType').
each{f.append((it.a == 'cover' ? 0 : 1) + 't' +
(it.b == 'cover' ? 0 : 1) + 'n')}
> cramersV(chisq.test(t[,1],t[,2])$observed)
[1] 0.0093161 No
correlation
betw
een
songs
by
song
type
(original or cover).
"Cramer's V is for nominal data correlations."
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by(out('sungBy').values('name'))
==>[a:Garcia, b:Spencer_Davis]
==>[a:Garcia, b:Weir]
==>[a:Garcia, b:Garcia]
==>[a:Garcia, b:Garcia]
==>[a:Garcia, b:Weir]
==>[a:Spencer_Davis, b:Weir]
==>[a:Spencer_Davis, b:Grateful_Dead]
==>[a:Weir, b:Garcia]
==>[a:Weir, b:All]
==>[a:Weir, b:Garcia]
...
gremlin> f = new File('singer-assortativity.txt')
==>singer-assortativity.txt
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by(out('sungBy').values('name')).
each{f.append(it.a.hashCode() + 't' + it.b.hashCode() + 'n')}
"Are songs that follow each other assortative by their singer?"
"Dah! Now
I need an algorithm
to turn a String to a unique integer.
Duh -- hashCode()."
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by(out('sungBy').values('name'))
==>[a:Garcia, b:Spencer_Davis]
==>[a:Garcia, b:Weir]
==>[a:Garcia, b:Garcia]
==>[a:Garcia, b:Garcia]
==>[a:Garcia, b:Weir]
==>[a:Spencer_Davis, b:Weir]
==>[a:Spencer_Davis, b:Grateful_Dead]
==>[a:Weir, b:Garcia]
==>[a:Weir, b:All]
==>[a:Weir, b:Garcia]
...
gremlin> f = new File('singer-assortativity.txt')
==>singer-assortativity.txt
gremlin> g.V().as('a').out('followedBy').as('b').
select('a','b').by(out('sungBy').values('name')).
each{f.append(it.a.hashCode() + 't' + it.b.hashCode() + 'n')}
> cramersV(chisq.test(t[,1],t[,2])$observed)
[1] 0.2354349 Songs
are
w
eakly
correlated
by
singers.
"However, its typically just Jerry and Bobby trading off in ones or twos…"
gremlin> g.V().hasLabel('song').
repeat(out('followedBy').groupCount('m').by('name')).times(8).
cap('m').
order(local).by(valueDecr).
limit(local,10)
"What songs are most central in the concert network?"
"Sortagettingboredworkingontheslides.Myimprov-vibedied…
ohyea,butIgotanastyclassicriffIremember."
"Not PaaaaaageRank again."
gremlin> g.V().hasLabel('song').
repeat(out('followedBy').groupCount('m').by('name')).times(8).
cap('m').
order(local).by(valueDecr).
limit(local,10)
==>PLAYING IN THE BAND=34142246667508
==>ME AND MY UNCLE=32094411419320
==>JACK STRAW=31867238591868
==>EL PASO=29973481580211
==>TRUCKING=29819272116849
==>PROMISED LAND=28663488257022
==>CHINA CAT SUNFLOWER=28569992924918
==>CUMBERLAND BLUES=26320323048221
==>LOOKS LIKE RAIN=26138795229794
==>RAMBLE ON ROSE=26059394903880
gremlin>
"Rodriguez, M.A., Gintautas, V., Pepe, A., “A Grateful Dead Analysis: The Relationship Between Concert and Listening
Behavior,” First Monday 14:1, ISSN:1396-0466, http://arxiv.org/abs/0807.2466, January 2009."
"Huh, those are the tracks from the 'greatest' Grateful Dead's greatest hits."
https://en.wikipedia.org/wiki/What_a_Long_Strange_Trip_It%27s_Been
gremlin> g.V().hasLabel('song').
repeat(out('followedBy').groupCount('m').by('name')).times(8).
cap('m').
order(local).by(valueDecr).
limit(local,10)
==>PLAYING IN THE BAND=34142246667508
==>ME AND MY UNCLE=32094411419320
==>JACK STRAW=31867238591868
==>EL PASO=29973481580211
==>TRUCKING=29819272116849
==>PROMISED LAND=28663488257022
==>CHINA CAT SUNFLOWER=28569992924918
==>CUMBERLAND BLUES=26320323048221
==>LOOKS LIKE RAIN=26138795229794
==>RAMBLE ON ROSE=26059394903880
gremlin> clock(10){g.V().hasLabel('song').
repeat(out('followedBy').groupCount('m').by('name')).times(8).
cap('m').
order(local).by(valueDecr).
limit(local,10)}
==>4040.761404
gremlin>
"4 seconds to analyze that many paths -- that is the power of bulking."
"34
trillion
traversers
passed
through
thisvertex."
gremlin> graph = HadoopGraph.open('hadoop-grateful-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin>
"And now note how the same Gremlin queries you've seen thus far can execute across a cluster."
gremlin> graph = HadoopGraph.open('hadoop-grateful-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],
sparkgraphcomputer]
gremlin>
G
raphCom
puter's
are
O
LAP
system
s
"Lets use Apache Spark via TinkerPop's SparkGraphComputer."
gremlin> graph = HadoopGraph.open('hadoop-grateful-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],
sparkgraphcomputer]
gremlin> g.V().match(
__.as('a').out('sungBy').as('b'),
__.as('a').out('writtenBy').as('b'),
__.as('a').values('name').as('c'),
__.as('b').values('name').as('d')).
select('c','d')
"Which songs were written and sung by the same person?"
"__" is
required
by
G
rem
lin-G
roovy
as
"as" is
a
keyword
in
G
roovy.
gremlin> graph = HadoopGraph.open('hadoop-grateful-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],
sparkgraphcomputer]
gremlin> g.V().match(
__.as('a').out('sungBy').as('b'),
__.as('a').out('writtenBy').as('b'),
__.as('a').values('name').as('c'),
__.as('b').values('name').as('d')).
select('c','d')
==>[c:ANY WONDER, d:Unknown]
==>[c:WALK DOWN THE STREET, d:Unknown]
==>[c:LEAVE YOUR LOVE AT HOME, d:Unknown]
==>[c:COWBOY SONG, d:Unknown]
==>[c:NEIGHBORHOOD GIRLS, d:Suzanne_Vega]
==>[c:MINDBENDER, d:Garcia_Lesh]
==>[c:EQUINOX, d:Lesh]
==>[c:NO LEFT TURN UNSTONED (CARDBOARD COWBOY), d:Lesh]
==>[c:CHILDHOODS END, d:Lesh]
==>[c:NEVER TRUST A WOMAN, d:Mydland]
...
gremlin>
"That was a distributed, declarative graph pattern match query.
This could have been over a 1000 node cluster."
g.V().match(
__.as('a').out('sungBy').as('b'),
__.as('a').out('writtenBy').as('b'),
__.as('a').values('name').as('c'),
__.as('b').values('name').as('d')).
select('c','d')
SELECT ?c ?d WHERE {
?a e:writtenBy ?b .
?a e:sungBy ?b .
?a v:name ?c .
?b v:name ?d
}
Gremlin-Groovy
"Gremlin supports the 'match'-style found in most pattern match languages such as SPARQL."
g.V().match(
__.as('a').out('sungBy').as('b'),
__.as('a').out('writtenBy').as('b'),
__.as('a').values('name').as('c'),
__.as('b').values('name').as('d')).
select('c','d')
SELECT ?c ?d WHERE {
?a e:writtenBy ?b .
?a e:sungBy ?b .
?a v:name ?c .
?b v:name ?d
}
Gremlin-Groovy
"Two different graph languages can be compiled to the Gremlin traversal machine."
Uses Apache
Jena's
SPARQ
L
parser
HOMEWORK: Use Apache Calcite
and convert SQL to a Gremlin traversal.SPARQL to Gremlin Compiler
Ted Wilmes was here.
gremlin> :install com.datastax sparql-gremlin 0.1
==>Loaded: [com.datastax, sparql-gremlin, 0.1]
gremlin> :plugin use datastax.sparql
==>datastax.sparql activated
gremlin>
"Install the SPARQL-Gremlin compiler developed by Daniel Kuppitz of DataStax."
gremlin> :install com.datastax sparql-gremlin 0.1
==>Loaded: [com.datastax, sparql-gremlin, 0.1]
gremlin> :plugin use datastax.sparql
==>datastax.sparql activated
gremlin> :remote connect datastax.sparql g
==>SPARQL[graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],
sparkgraphcomputer]]
gremlin>
"Given that SPARQL is NOT Groovy, it is passed to the cluster as a remote String."
gremlin> :install com.datastax sparql-gremlin 0.1
==>Loaded: [com.datastax, sparql-gremlin, 0.1]
gremlin> :plugin use datastax.sparql
==>datastax.sparql activated
gremlin> :remote connect datastax.sparql g
==>SPARQL[graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],
sparkgraphcomputer]]
gremlin> :> SELECT ?c ?d WHERE {
?a e:writtenBy ?b .
?a e:sungBy ?b .
?a v:name ?c .
?b v:name ?d }
==>[c:ANY WONDER, d:Unknown]
==>[c:WALK DOWN THE STREET, d:Unknown]
==>[c:LEAVE YOUR LOVE AT HOME, d:Unknown]
==>[c:COWBOY SONG, d:Unknown]
==>[c:NEIGHBORHOOD GIRLS, d:Suzanne_Vega]
==>[c:MINDBENDER, d:Garcia_Lesh]
==>[c:EQUINOX, d:Lesh]
==>[c:NO LEFT TURN UNSTONED (CARDBOARD COWBOY), d:Lesh]
==>[c:CHILDHOODS END, d:Lesh]
==>[c:NEVER TRUST A WOMAN, d:Mydland]
...
gremlin>
"SPARQL was just executed over Apache Spark by way of the Gremlin traversal machine."
gremlin> g = graph.traversal(computer(GiraphGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],
giraphgraphcomputer]
gremlin>
"Use a different GraphComputer."
gremlin> g = graph.traversal(computer(GiraphGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],
giraphgraphcomputer]
gremlin> g.V().match(
__.as('a').out('sungBy').as('b'),
__.as('a').out('writtenBy').as('b'),
__.as('a').values('name').as('c'),
__.as('b').values('name').as('d')).
select('c','d')
INFO org.apache.hadoop.mapreduce.Job - Running job: job_1445539638801_0011
INFO org.apache.hadoop.mapreduce.Job - map 50% reduce 0%
INFO org.apache.hadoop.mapreduce.Job - map 100% reduce 0%
INFO org.apache.hadoop.mapreduce.Job - Counters: 55
...
Giraph Stats
Superstep 1 GiraphComputation (ms)=2022
Superstep 2 GiraphComputation (ms)=1994
Superstep 3 GiraphComputation (ms)=999
Superstep 4 GiraphComputation (ms)=1001
Superstep 5 GiraphComputation (ms)=1254
Total (ms)=21425
...
==>[c:IT MUST HAVE BEEN THE ROSES, d:Hunter]
==>[c:EASY WIND, d:Hunter]
==>[c:WHATLL YOU RAISE, d:Hunter]
==>[c:CRYPTICAL ENVELOPMENT, d:Garcia]
==>[c:CREAM PUFF WAR, d:Garcia]
==>[c:DRUMS, d:Grateful_Dead]
...
gremlin>
"Gremlin-Groovy just executed over Apache Giraph by way of the Gremlin traversal machine."
"I'm a Gingerbremlin."
Gremlin-Java8
SPARQL
Cypher GraphQL
G
rem
lin-Scala
O
rientDB
Neo4j
Titan
Spark
Giraph
Hadoop
Sqlg
IBM
BlueMix
Any Graph System
Any Graph Language
Stardog
Ripple
SQ
L
* Many system providers are still on TinkerPop2 and thus, haven't migrated to TinkerPop3.
(JO
INs
are
walks!)
(easy-parser, its JSON!)
Works over
PostgreSQL and MySQL
Pieter Martin works on this.
CassandraandHBase
RED = "No known compiler."
RDFstore
Thank you….
Rodriguez, M.A., Kuppitz, D., Yim, K., “Tales From the TinkerPop," DataStax Engineering Blog, 2015.
http://www.datastax.com/dev/blog/tales-from-the-tinkerpop

Contenu connexe

Tendances

Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide trainingSpark Summit
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageMarko Rodriguez
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipelinezahid-mian
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Introduction to D3.js
Introduction to D3.jsIntroduction to D3.js
Introduction to D3.jsKnoldus Inc.
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Databricks
 
Extending Gremlin with Foundational Steps
Extending Gremlin with Foundational StepsExtending Gremlin with Foundational Steps
Extending Gremlin with Foundational StepsStephen Mallette
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Lazy vs. Eager Loading Strategies in JPA 2.1
Lazy vs. Eager Loading Strategies in JPA 2.1Lazy vs. Eager Loading Strategies in JPA 2.1
Lazy vs. Eager Loading Strategies in JPA 2.1Patrycja Wegrzynowicz
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Josef A. Habdank
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 

Tendances (20)

Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
GraphQL Fundamentals
GraphQL FundamentalsGraphQL Fundamentals
GraphQL Fundamentals
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
jQuery for beginners
jQuery for beginnersjQuery for beginners
jQuery for beginners
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming Language
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Introduction to D3.js
Introduction to D3.jsIntroduction to D3.js
Introduction to D3.js
 
React & GraphQL
React & GraphQLReact & GraphQL
React & GraphQL
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Extending Gremlin with Foundational Steps
Extending Gremlin with Foundational StepsExtending Gremlin with Foundational Steps
Extending Gremlin with Foundational Steps
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Intro GraphQL
Intro GraphQLIntro GraphQL
Intro GraphQL
 
Lazy vs. Eager Loading Strategies in JPA 2.1
Lazy vs. Eager Loading Strategies in JPA 2.1Lazy vs. Eager Loading Strategies in JPA 2.1
Lazy vs. Eager Loading Strategies in JPA 2.1
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 

En vedette

Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph ComputingMarko Rodriguez
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Stephen Mallette
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with GraphsMarko Rodriguez
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 
Seda an architecture for well-conditioned scalable internet services
Seda   an architecture for well-conditioned scalable internet servicesSeda   an architecture for well-conditioned scalable internet services
Seda an architecture for well-conditioned scalable internet servicesbdemchak
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesQian Lin
 
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...DataStax
 
Configurando o Geany para Python - 03/2012
Configurando o Geany para Python - 03/2012Configurando o Geany para Python - 03/2012
Configurando o Geany para Python - 03/2012Marco Mendes
 
Configurando o geany_para_python
Configurando o geany_para_pythonConfigurando o geany_para_python
Configurando o geany_para_pythonMarco Mendes
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...DataWorks Summit
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...DataStax
 
PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!DataStax Academy
 

En vedette (20)

Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
 
Seda an architecture for well-conditioned scalable internet services
Seda   an architecture for well-conditioned scalable internet servicesSeda   an architecture for well-conditioned scalable internet services
Seda an architecture for well-conditioned scalable internet services
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
Data Driven Growth
Data Driven GrowthData Driven Growth
Data Driven Growth
 
IDEs y Frameworks mas utilizados
IDEs y Frameworks mas utilizadosIDEs y Frameworks mas utilizados
IDEs y Frameworks mas utilizados
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
 
Configurando o Geany para Python - 03/2012
Configurando o Geany para Python - 03/2012Configurando o Geany para Python - 03/2012
Configurando o Geany para Python - 03/2012
 
Configurando o geany_para_python
Configurando o geany_para_pythonConfigurando o geany_para_python
Configurando o geany_para_python
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
 
PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!
 

Similaire à ACM DBPL Keynote: The Graph Traversal Machine and Language

BUILDING WHILE FLYING
BUILDING WHILE FLYINGBUILDING WHILE FLYING
BUILDING WHILE FLYINGKamal Shannak
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQLLuigi Dell'Aquila
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsToyotaro Suzumura
 
Linux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringLinux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringPDE1D
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
LISP: назад в будущее, Микола Мозговий
LISP: назад в будущее, Микола МозговийLISP: назад в будущее, Микола Мозговий
LISP: назад в будущее, Микола МозговийSigma Software
 
How Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionHow Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionLuca Garulli
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroGraphAware
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentationehtshamelahi
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Антон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabАнтон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabDiana Dymolazova
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
 
On being a professional software developer
On being a professional software developerOn being a professional software developer
On being a professional software developerAnton Kirillov
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 

Similaire à ACM DBPL Keynote: The Graph Traversal Machine and Language (20)

BUILDING WHILE FLYING
BUILDING WHILE FLYINGBUILDING WHILE FLYING
BUILDING WHILE FLYING
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
 
Linux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringLinux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and Engineering
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
LISP: назад в будущее, Микола Мозговий
LISP: назад в будущее, Микола МозговийLISP: назад в будущее, Микола Мозговий
LISP: назад в будущее, Микола Мозговий
 
How Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionHow Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolution
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro Negro
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentation
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
effectivegraphsmro1
effectivegraphsmro1effectivegraphsmro1
effectivegraphsmro1
 
Антон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabАнтон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLab
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
On being a professional software developer
On being a professional software developerOn being a professional software developer
On being a professional software developer
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 

Plus de Marko Rodriguez

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic MachineMarko Rodriguez
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data TypeMarko Rodriguez
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryMarko Rodriguez
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialMarko Rodriguez
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics EngineMarko Rodriguez
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph DatabasesMarko Rodriguez
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinMarko Rodriguez
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical GremlinMarko Rodriguez
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the GraphMarko Rodriguez
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMarko Rodriguez
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataMarko Rodriguez
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Marko Rodriguez
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceMarko Rodriguez
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming PatternMarko Rodriguez
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in ComputingMarko Rodriguez
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly CommunityMarko Rodriguez
 
General-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked ProcessGeneral-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked ProcessMarko Rodriguez
 
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human EudaimoniaCollective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human EudaimoniaMarko Rodriguez
 

Plus de Marko Rodriguez (20)

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machine
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Type
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph Theory
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM Dial
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph Databases
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical Gremlin
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network Science
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming Pattern
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in Computing
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
General-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked ProcessGeneral-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked Process
 
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human EudaimoniaCollective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
 

Dernier

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 

Dernier (20)

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 

ACM DBPL Keynote: The Graph Traversal Machine and Language

  • 1. The Gremlin Graph Traversal Machine and Language Dr. Marko A. Rodriguez Director of Engineering at DataStax, Inc. Project Management Committee, Apache TinkerPop http://tinkerpop.incubator.apache.org Database Programming Languages Rodriguez, M.A., “The Gremlin Graph Traversal Machine and Language,” Proceedings of the ACM Database Programming Languages Conference, doi:10.1145/2815072.2815073, ACM, Pittsburg, Pennsylvania, October 2015.
  • 2. Java is a virtual machine. Java is a programming language.
  • 3. Java is a virtual machine. Java is a programming language. Gremlin is a traversal machine. Gremlin is a traversal language.
  • 4. Java is a virtual machine. Java is a programming language. Gremlin is a traversal machine. Gremlin is a traversal language. Java is operating system agnostic. MacOSX, Linux, Windows, etc.
  • 5. Java is a virtual machine. Java is a programming language. Gremlin is a traversal machine. Gremlin is a traversal language. Java is operating system agnostic. MacOSX, Linux, Windows, etc. Gremlin is graph system agnostic. Titan, Neo4j, Stardog, Giraph, Spark, Hadoop, etc.
  • 6. Java is a virtual machine. Java is a programming language. Gremlin is a traversal machine. Gremlin is a traversal language. Other languages compile to the Java virtual machine. Groovy, Scala, Clojure, JavaScript Nashorn, etc. Java is operating system agnostic. MacOSX, Linux, Windows, etc. Gremlin is graph system agnostic. Titan, Neo4j, Stardog, Giraph, Spark, Hadoop, etc.
  • 7. Java is a virtual machine. Java is a programming language. Gremlin is a traversal machine. Gremlin is a traversal language. Other languages compile to the Java virtual machine. Groovy, Scala, Clojure, JavaScript Nashorn, etc. Other languages compile to the Gremlin traversal machine. Gremlin-Groovy, Gremlin-Scala, SPARQL, etc. Java is operating system agnostic. MacOSX, Linux, Windows, etc. Gremlin is graph system agnostic. Titan, Neo4j, Stardog, Giraph, Spark, Hadoop, etc. Rodriguez, M.A., Kuppitz, D., “The Benefits of the Gremlin Graph Traversal Machine," DataStax Engineering Blog, 2015. http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
  • 8. TinkerPop is an open source graph computing framework. TinkerPop was started in 2009 by Marko A. Rodriguez, Josh Shinavier, and Peter Neubauer. TinkerPop joined the Apache Software Foundation January 2015 (currently in incubation). TinkerPop designs and develops the Gremlin graph traversal machine and language. TinkerPop is supported by nearly every graph system provider (both commercial and open source). TinkerPop3 has been in development for 2 years and was officially released July 2015. "What is The TinkerPop?"
  • 9. "What is The TinkerPop?" Graph Database Graph Processor Provider API Provider API Provider Strategies Gremlin Traversal Language Graph Computer Core API (graph, vertex, edge) Monitoring Gremlin Server JSON Kryo ... ganglia cvs graphite jmx { User DSL OLTP OLAP
  • 10. TinkerPop provides graph computing capabilities to any data processing system. TinkerPop supports both OLTP (database) and OLAP (batch processor) systems. TinkerPop is used to process the order fulfillment network of Amazon.com ("100 billion vertices" -- at least factor 10x on edges?). TinkerPop is the sole interface for >50% of all graph systems in the market/wild. TinkerPop currently processes a near trillion edge graph represented over 1000 machines. TinkerPop works with Apache Cassandra, Apache HBase, Apache Hadoop, Apache Spark, Apache Giraph, Apache Atlas, Apache Falcon, ... TinkerPop is always looking for more contributors and perhaps TinkerPop could be your future software/language design and development home. "Why should you care about The TinkerPop?" https://videos.dabcc.com/aws-reinvent-2015-dat203-building-graph-databases-on-aws/ (can't disclose company name)
  • 11. TinkerPop provides graph computing capabilities to any data processing system. TinkerPop supports both OLTP (database) and OLAP (batch processor) systems. TinkerPop is used to process the order fulfillment network of Amazon.com ("100 billion vertices" -- at least factor 10x on edges?). TinkerPop is the sole interface for >50% of all graph systems in the market/wild. TinkerPop currently processes a near trillion edge graph represented over 1000 machines. TinkerPop works with Apache Cassandra, Apache HBase, Apache Hadoop, Apache Spark, Apache Giraph, Apache Atlas, Apache Falcon, ... TinkerPop is always looking for more contributors and perhaps TinkerPop could be your future software/language design and development home. "Why should you care about The TinkerPop?" https://videos.dabcc.com/aws-reinvent-2015-dat203-building-graph-databases-on-aws/ TinkerPop has language bindings in every major programming language.
  • 12. TinkerPop provides graph computing capabilities to any data processing system. TinkerPop supports both OLTP (database) and OLAP (batch processor) systems. TinkerPop is used to process the order fulfillment network of Amazon.com ("100 billion vertices" -- at least factor 10x on edges?). TinkerPop is the sole interface for >50% of all graph systems in the market/wild. TinkerPop currently processes a near trillion edge graph represented over 1000 machines. TinkerPop works with Apache Cassandra, Apache HBase, Apache Hadoop, Apache Spark, Apache Giraph, Apache Atlas, Apache Falcon, ... TinkerPop is always looking for more contributors and perhaps TinkerPop could be your future software/language design and development home. "Why should you care about The TinkerPop?" https://videos.dabcc.com/aws-reinvent-2015-dat203-building-graph-databases-on-aws/ TinkerPop has language bindings in every major programming language. TinkerPop isalso used forsm allin-m em orygraphs.
  • 13. Part 1: The Gremlin Traversal Machine
  • 14. The Machine Components The Graph The Traverser The Traversal The Data The CPU The Program
  • 15. The Graph vertex "The graph is the 'RAM'."
  • 16. The Graph 1 vertex id "Vertices and edges in the graph have unique ids (addresses)."
  • 26. The Graph 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33 Directed, binary, attributed multi-graph. G = (V, E ⊆ (V × V ), λ : (V ∪ E) × Σ∗ → U (V ∪ E)) vertices edges directed, binary labels+properties function properties can't reference vertices or edges property key (string) vertices or edges U = anything
  • 27. The Graph 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33 Directed, binary, attributed multi-graph. G = (V, E ⊆ (V × V ), λ : (V ∪ E) × Σ∗ → U (V ∪ E)) λ(v1, name) → marko λ(e2, label) → knows ((e2)1, (e2)2) = (v1, v3)
  • 28. The Graph 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33 Directed, binary, attributed multi-graph. Property graph. * Multi- and meta-properties are not discussed in this presentation nor in the associated conference article. http://tinkerpop.incubator.apache.org/docs/3.0.2-incubating/#vertex-properties
  • 30. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin>
  • 31. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> "TinkerGraph is an in-memory graph system provided by TinkerPop."
  • 32. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> TitanGraph.open(…) Neo4jGraph.open(…) OrientGraph.open(…) HadoopGraph.open(…) HadoopGraph. compute(GiraphGraphComputer) HadoopGraph. compute(SparkGraphComputer) "Gremlin is agnostic to the underlying graph system." StardogGraph.open(…) etc... DSEGraph.open(…)
  • 33. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> v1 = graph.addVertex(id,1,label,'person') ==>v[1] gremlin> "A graph system provider is Gremlin-enabled if they implement the gremlin.structure API suite." 1 person
  • 34. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> v1 = graph.addVertex(id,1,label,'person') ==>v[1] gremlin> v1.property('name','marko') ==>vp[name->marko] gremlin> "The APIs have to do with CRUD operations on a graph structure." 1 person name:marko
  • 35. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> v1 = graph.addVertex(id,1,label,'person') ==>v[1] gremlin> v1.property('name','marko') ==>vp[name->marko] gremlin> v1.property('age',36) ==>vp[age->36] gremlin> 1 person name:marko age:36
  • 36. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> v1 = graph.addVertex(id,1,label,'person') ==>v[1] gremlin> v1.property('name','marko') ==>vp[name->marko] gremlin> v1.property('age',36) ==>vp[age->36] gremlin> v3 = graph.addVertex(id,3,label,'person','name','kuppitz','age',33) ==>v[3] gremlin> 3 person name:kuppitz age:33
  • 37. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> v1 = graph.addVertex(id,1,label,'person') ==>v[1] gremlin> v1.property('name','marko') ==>vp[name->marko] gremlin> v1.property('age',36) ==>vp[age->36] gremlin> v3 = graph.addVertex(id,3,label,'person','name','kuppitz','age',33) ==>v[3] gremlin> v1.addEdge('knows',v3,id,2,'since',2013,'weight',0.9) ==>e[2][1-knows->3] gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33
  • 38. The Machine Components The Graph The Traverser The Traversal The Data The CPU The Program
  • 39. The Traverser "The traverser is a 'CPU core'."
  • 40. The Traverser t ∈ T "The traverser t is in a set of traversers T."
  • 41. The Traverser t ∈ T 3 "The traverser's current graph location is vertex 3." µ(t) → v3
  • 42. The Traverser t ∈ T 3 "The traverser may remember his history in the graph." 1 3 2 µ(t) → v3 ∆(t) → ((∅, v1), (∅, e2), (∅, v3))
  • 43. The Traverser t ∈ T 3 "The traverser records how many times he has been through a loop sequence." 1 3 2 µ(t) → v3 ∆(t) → ((∅, v1), (∅, e2), (∅, v3)) ι(t) → 0
  • 44. The Traverser t ∈ T 3 "The traverser can represent more than one traverser." 1 3 2 µ(t) → v3 ∆(t) → ((∅, v1), (∅, e2), (∅, v3)) ι(t) → 0 It's what you would do if you thought in term s of "energy flows." β(t) → 1
  • 45. The Traverser t ∈ T 3 "The traverser has a reference to his location in the traversal (program counter)." 1 3 2 µ(t) → v3 ∆(t) → ((∅, v1), (∅, e2), (∅, v3)) ι(t) → 0 β(t) → 1 outE('knows') inV() values('name')g.V(1) ψ(t) → values(‘name’)
  • 46. T ⊆ U × Ψ × (P(Σ∗ ) × U)∗ × N+ × U × N+ The Traverser t ∈ T 3 "Gremlin sack" (not discussed here) 1 3 2 µ(t) → v3 ∆(t) → ((∅, v1), (∅, e2), (∅, v3)) ι(t) → 0 β(t) → 1 outE('knows') inV() values('name')g.V(1) ψ(t) → values(‘name’)
  • 47. The Traverser "Numerous (many many many) traversers are spawned during a traversal (computation)." many many Com binatoric explosion
  • 48. The Traverser "A 21x21 vertex lattice -- 20x20 edges."
  • 49. The Traverser "A Gremlin starts at (0,0) and can only go right and down to reach (20,20). How many traversers (paths) ultimately reach (20,20)?"
  • 50. The Traverser "For the sake of demonstration, lets look at a 5x5 lattice (4x4 edges)."
  • 55. The Traverser "Total = 16" 1 1 4 4 6
  • 56. The Traverser "Total = 20" 5 5 10 10
  • 57. The Traverser "Total = 50" 15 15 20
  • 60. The Traverser 70 ✓ 2n n ◆ = (2n)! (n!)2 = (2n)(2n 1)(2n 2) . . . 1 (n(n 1)(n 2) . . . 1)2 8 choose 4 = 70 n = number of edges on a side
  • 61. "Analytically, ~137 billion traversers will exist at (20,20)." The Traverser ✓ 2n n ◆ = (2n)! (n!)2 = (2n)(2n 1)(2n 2) . . . 1 (n(n 1)(n 2) . . . 1)2 40 choose 20
  • 63. "No. At vertex 400 (20,20), there exists 1 traverser with a bulk of ~137 billion." The Traverser gremlin> g.V(0).repeat(out('down','right')).times(40).count() ==>137846528820 gremlin> clock(100){g.V(0).repeat(out('down','right')).times(40).count().next()} ==>0.8890942 less than a millisecond β(t) → 137846528820 Bulking is a crucial component of OLAP when near trillion edge graphs are moving traversers between machines.
  • 64. The Traverser "If 2 traversers (each with bulk 1) are at the same location in the graph and have similar other properties, make 1 traverser with a bulk of 2." Traverser Equivalence Class graph location traversal location path history same loop counter [t] = {t ∈ T | µ(t) = µ(t ) ∧ ψ(t) = ψ(t ) ∧ ∆(t) = ∆(t ) ∧ ι(t) = ι(t ) }. data location program counter MAY NOT BE REQUIRED recursion of program counter β(t1) → 1 β(t2) → 3 β(t3) → 4 β(t4) → 8 Don't enumerate, bulk!
  • 65. The Traverser User Machine "In Gremlin OLAP, the graph is represented across a distributed system (e.g. Hadoop) as a partitioned adjacency list." GTM GTM GTM GTM Machine C Machine A Machine B Machine D Cluster
  • 66. The Traverser User Machine x.y.z "The user creates a traversal (compiled from any language)." GTM GTM GTM GTM Machine C Machine A Machine B Machine D Cluster
  • 67. The Traverser x.y.z x.y.z x.y.zx.y.z GTM GTM GTM GTM Machine C Machine A Machine B Machine D Cluster User Machine x.y.z "The compiled traversal is sent to every machine in the cluster which has a piece of the graph and a Gremlin traversal machine."
  • 68. The Traverser GTM GTM GTM GTM Machine C Machine A Machine B Machine D Cluster User Machine x.y.z "When the graph computation is complete, a reference to the result is returned. For instance, in HadoopGremlin, HDFS stores both the graph and sideEffect data."
  • 69. The Traverser x.y.z GTM x.y.z GTM Machine A Machine B "The OLAP algorithm is the classic Bulk Synchronous Parallel algorithm popularized by Google Pregel and Apache Giraph."
  • 70. x.y.z GTM x.y.z GTM The Traverser Machine A Machine B "The messages are traversers. Given the traverser equivalence class [t], they can be bulked." "Its just "complex energy."
  • 71. The Machine Components The Graph The Traverser The Traversal The Data The CPU The Program Lim bo!
  • 72. The Traversal Ψ traversal "The traversal is the 'software program'."
  • 73. The Traversal traversalΨ step-g step-hstep-f composed of f ∈ Ψ g ∈ Ψ h ∈ Ψ "The steps are the 'instructions'."
  • 74. The Traversal traversalΨ step-g step-hstep-f composed of f ∈ Ψ defined as g ∈ Ψ h ∈ Ψ f : A∗ → B∗ "Step f maps a stream (multi-set) of traversers located at objects of type A to a stream (multi-set) of traversers located at objects of type B."
  • 75. The Traversal traversalΨ step-g step-hstep-f composed of f ∈ Ψ defined as g ∈ Ψ h ∈ Ψ f : A∗ → B∗ "Step values('name') maps a stream (multi-set) of traversers located at vertices to a stream (multi-set) of traversers located at strings." kuppitz m allette frantz plurad values('name') :
  • 76. The Traversal map : A∗ → B∗ "For a traverser at a, move the traverser to an object at b." kuppitzvalues('name') :
  • 77. The Traversal map : A∗ → B∗ "For a traverser at a, move the traverser to an object at b." flatMap : A∗ → B∗ "For a traverser at a, clone the traverser across multiple objects in B." kuppitzvalues('name') : out('knows') :
  • 78. The Traversal map : A∗ → B∗ "For a traverser at a, move the traverser to an object at b." flatMap : A∗ → B∗ "For a traverser at a, clone the traverser across multiple objects in B." "For a traverser at a, either kill the traverser or leave him alone." filter : A∗ → A∗ kuppitzvalues('name') : out('knows') : has('age',lt(30)) :
  • 79. The Traversal map : A∗ → B∗ "For a traverser at a, move the traverser to an object at b." flatMap : A∗ → B∗ "For a traverser at a, clone the traverser across multiple objects in B." "For a traverser at a, either kill the traverser or leave him alone." filter : A∗ → A∗ "For a traverser at a, leave him alone though manipulate some data structure x." sideEffect : A∗ →x A∗ kuppitzvalues('name') : out('knows') : has('age',lt(30)) : groupCount('m') : m v[1]:1 v[2]:34
  • 80. The Traversal map : A∗ → B∗ "For a traverser at a, move the traverser to an object at b." flatMap : A∗ → B∗ "For a traverser at a, clone the traverser across multiple objects in B." "For a traverser at a, either kill the traverser or leave him alone." filter : A∗ → A∗ "For a traverser at a, leave him alone though manipulate some data structure x." sideEffect : A∗ →x A∗ branch : A∗ →b B∗ "For a traverser at a, choose some internal branch b to ultimately yield traversers at objects of type B." kuppitzvalues('name') : out('knows') : has('age',lt(30)) : groupCount('m') : m v[1]:1 v[2]:34 repeat(out('knows')).times(5) :
  • 81. The Traversal map flatMap filter sideEffect branch BranchStep ChooseStep LocalStep RepeatStep UnionStep ... AndStep CoinStep CyclicPathStep DedupStep DropStep FilterStep HasStep IsStep NotStep OrStep RangeStep ... VertexStep EdgeVertexStep CoalesceStep ... LabelStep IdStep PathStep SackStep SelectStep MatchStep ConstantStep ... GroupCountStep GroupStep StoreStep AggregateStep InjectStep SubgraphStep TreeStep IdentityStep ... "This is the step library (instruction set) of the Gremlin graph traversal machine (virtual machine)."
  • 82. The Traversal f ◦ g ◦ h Linear Motif g.V(1).out('knows').has('age',lt(30)).values('name') f g h repeatout('knows') has('age',lt(30)) values('name')g.V(1) "The 'byte code' of Gremlin is a sequence of steps .... "
  • 83. The Traversal f ◦ g ◦ h Linear Motif f(g ◦ h) ◦ k Nested Motif g.V(1).out('knows').has('age',lt(30)).values('name') f g h g.V(1).repeat(out('knows').has('age',lt(30))).values('name') f g h k repeat out('knows') has('age',lt(30)) values('name')g.V(1) repeatout('knows') has('age',lt(30)) values('name')g.V(1) "The 'byte code' of Gremlin is a sequence of steps with some steps nested within each other."
  • 84. gremlin> graph ==>tinkergraph[vertices:2 edges:1] gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33 "Back to our super baby toy graph."
  • 85. gremlin> graph ==>tinkergraph[vertices:2 edges:1] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard] gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33 O LTP vs. O LAP graph database vs. processor Neo4j vs. G iraph
  • 86. gremlin> graph ==>tinkergraph[vertices:2 edges:1] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard] gremlin> g.V(1) ==>v[1] gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33
  • 87. gremlin> graph ==>tinkergraph[vertices:2 edges:1] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard] gremlin> g.V(1) ==>v[1] gremlin> g.V(1).label() ==>person gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33
  • 88. gremlin> graph ==>tinkergraph[vertices:2 edges:1] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard] gremlin> g.V(1) ==>v[1] gremlin> g.V(1).label() ==>person gremlin> g.V(1).outE() ==>e[2][1-knows->3] gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33
  • 89. gremlin> graph ==>tinkergraph[vertices:2 edges:1] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard] gremlin> g.V(1) ==>v[1] gremlin> g.V(1).label() ==>person gremlin> g.V(1).outE() ==>e[2][1-knows->3] gremlin> g.V(1).outE().valueMap() ==>[weight:0.9, since:2013] gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33
  • 90. gremlin> graph ==>tinkergraph[vertices:2 edges:1] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard] gremlin> g.V(1) ==>v[1] gremlin> g.V(1).label() ==>person gremlin> g.V(1).outE() ==>e[2][1-knows->3] gremlin> g.V(1).outE().valueMap() ==>[weight:0.9, since:2013] gremlin> g.V(1).out() ==>v[3] gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33
  • 91. gremlin> graph ==>tinkergraph[vertices:2 edges:1] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:2 edges:1], standard] gremlin> g.V(1) ==>v[1] gremlin> g.V(1).label() ==>person gremlin> g.V(1).outE() ==>e[2][1-knows->3] gremlin> g.V(1).outE().valueMap() ==>[weight:0.9, since:2013] gremlin> g.V(1).out() ==>v[3] gremlin> g.V(1).out().values('age') ==>33 gremlin> 1 person name:marko age:36 2 knows since:2013 weight:0.9 3 person name:kuppitz age:33
  • 92. The Machine Components The Graph The Traverser The Traversal The Data The CPU The Program
  • 93. G ←− µ t ∈ T {∆, β, ς, ι} ψ −→ Ψ The Machine Components
  • 94. G ←− µ t ∈ T {∆, β, ς, ι} ψ −→ Ψ The Machine Components
  • 95. G ←− µ t ∈ T {∆, β, ς, ι} ψ −→ Ψ Graph Structure Graph Process+ = Graph Computing (Data) (Algorithm) The Machine Components
  • 96. "Traversers move about a graph as instructed by their traversal. The result of the computation is 1.) the final location of all halted traversers and 2.) the state of any side-effect data structures." The Machine Components
  • 97. Part 2: The Gremlin Traversal Language
  • 98. Ψ Gremlin's step library is the instruction set of the Gremlin traversal machine. A linear/nested composition of steps forms a traversal which is executed by the Gremlin traversal machine. step1 step2 step3 step4 step1 step3 step4step2 Gremlin-Java8 is the language provided by TinkerPop, though any language (with respective compiler) can be used to write Gremlin traversals. g.V(1).as("a"). out("knows").as("b"). select("a","b") SELECT ?a ?b WHERE { ?a e:knows ?b ?a v:id ?c FILTER (?c == 1) }Gremlin-Java8 Written by Daniel Kuppitz https://github.com/dkuppitz/sparql-gremlin =
  • 99. g.V().out('knows') SELECT ?x ?y WHERE g.V.out.name GraphLanguage Compiler from Graph Language to Gremlin Traversal Gremlin Traversal Machine GraphSystem Database(OLTP) Processor(OLAP) GremlinTraversal Physical Machine DataProgram Traversal Heap/DiskMemory Memory Memory/Graph System Physical Machine Java Virtual Machine bytecode steps DataProgram Memory/DiskMemory Physical Machine instructions Java Virtual Machine Gremlin Traversal Machine "The specification is easy … doesn't have to only be on the JVM."
  • 100. ~/tinkerpop3$ bin/gremlin.sh gremlin> "Gremlin-Groovy can be executed in the Gremlin Console REPL and thus, is good for demonstrations."
  • 101. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin>
  • 102. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin>
  • 103. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml') ==>null gremlin>
  • 104. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml') ==>null gremlin> GraphML Gryo GraphSON "TinkerPop supports three I/O graph formats. Though the user can add their own format/parser."
  • 105. ~/tinkerpop3$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml') ==>null gremlin> name:<String> Artist Song name:<String> songType:<String> performances:<Integer>sungBy writtenBy followedBy weight:<Long>
  • 106. gremlin> :plugin use tinkerpop.gephi ==>tinkerpop.gephi activated gremlin> :remote connect tinkerpop.gephi ==>Connection to Gephi - http://localhost:8080/workspace0 with stepDelay:1000, startRGBColor:[0.0, 1.0, 0.5], colorToFade:g, colorFadeRate:0.7 gremlin> :> graph ==>tinkergraph[vertices:808 edges:8049] gremlin> http://gephi.org Generated by Daniel Kuppitz Rodriguez, M.A., Kuppitz, D., Yim, K., “Tales From the TinkerPop," DataStax Engineering Blog, 2015. http://www.datastax.com/dev/blog/tales-from-the-tinkerpop
  • 107. This is an actual Gremlin/R session I performed for this presentation. I was interested in understanding why Grateful Dead concerts continue to fascinate me. SPOILER ALERT: No local correlations -- correlations exist in the stationary distribution. EigenDead..the long run."
  • 108. gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml') ==>null gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard] gremlin>
  • 109. gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml') ==>null gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard] gremlin> g.V().has('name','DARK STAR') ==>v[89] gremlin> Vertex filter has(...) "one-to-[one-or-none]" Vertex 89 "Get the vertex named Dark Star."
  • 110. gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml') ==>null gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard] gremlin> g.V().has('name','DARK STAR') ==>v[89] gremlin> } ProviderOptimizationStrategy * OLTP graph system providers turn this into an index-lookup. Vertex filter has(...) "one-to-[one-or-none]" Vertex 89 "TraversalStrategies are an important part of Gremlin (discussed at length in the conference article)."
  • 111. gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml') ==>null gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard] gremlin> g.V().has('name','DARK STAR') ==>v[89] gremlin> g.V().has('name','DARK STAR').out('followedBy').values('name') ==>TRUCKING ==>EYES OF THE WORLD ==>HES GONE ==>SING ME BACK HOME ==>SPANISH JAM ==>CHINA DOLL ==>MORNING DEW ==>WHARF RAT ==>THE OTHER ONE ==>MIND LEFT BODY JAM ... gremlin> Vertex filter has(...) "one-to-[one-or-none]" Vertex 89 flatMap out('followedBy') "one-to-many" Vertex "one-to-one" map values('name') String TRUCKING EYES OF THE WORLD HES GONE SING ME BACK HOME SPANISH JAM ... "What are the names of the songs that have followed Dark Star in concert?"
  • 112. gremlin> g.V().outE().groupCount().by(label) ==>[followedBy:7047, sungBy:501, writtenBy:501] gremlin> Vertex Map<String,Long> flatMap [ followedBy:7074 sungBy:501 writtenBy:501 ] outE() {"one-to-many" Edge map groupCount() "many-to-one" reducing barrier "What is the distribution of edge labels in the graph?"
  • 113. gremlin> g.V().outE().groupCount().by(label) ==>[followedBy:7047, sungBy:501, writtenBy:501] gremlin> Vertex Map<String,Long> flatMap [ followedBy:7074 sungBy:501 writtenBy:501 ] outE() {"one-to-many" Edge map groupCount() "many-to-one" reducing barrier "What is the distribution of edge labels in the graph?" groupCount()isa TraversalParent(nestingstep):label()
  • 115. gremlin> g.V().groupCount().by(outE().count()) ==>0=224 ==>1=26 ==>2=254 ==>3=45 ==>4=24 ==>5=20 ==>6=10 ==>7=10 ==>8=6 ==>9=8 ... gremlin> f = new File('grateful-dead-distribution.txt') ==>grateful-dead-distribution.txt gremlin> g.V().groupCount().by(outE().count()). next().each{degree,freq -> f.append(degree + 't' + freq + 'n')} gremlin> :quit "Save result to a tab delimitated file for further analysis."
  • 116. $ r > t <- read.table("grateful-dead-distribution.txt") >
  • 117. 0 20 40 60 80 050100150200250 Grateful Dead Degree Distribution out degree frequency$ r > t <- read.table("grateful-dead-distribution.txt") > plot(t,xlab="out degree", ylab="frequency",cex=1.5,cex.lab=1.5,cex.axis=1.5, main="Grateful Dead Degree Distribution") >
  • 118. $ r > t <- read.table("grateful-dead-distribution.txt") > plot(t,xlab="out degree", ylab="frequency",cex=1.5,cex.lab=1.5,cex.axis=1.5, main="Grateful Dead Degree Distribution") > plot(t,xlab="out degree", ylab="frequency",cex=1.5,cex.lab=1.5,cex.axis=1.5, main="Grateful Dead Degree Distribution",log="xy") > 0 20 40 60 80 050100150200250 Grateful Dead Degree Distribution out degree frequency 1 2 5 10 20 50 100 125102050100 Grateful Dead Degree Distribution out degree frequency "Yea, yet another power law. Its 2005 and I get a free publication!"
  • 119. gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by('performances') ==>[a:5, b:1] ==>[a:5, b:531] ==>[a:5, b:394] ==>[a:5, b:293] ==>[a:5, b:1] ==>[a:1, b:473] ==>[a:1, b:24] ==>[a:531, b:87] ==>[a:531, b:41] ==>[a:531, b:254] ... gremlin> f = new File('performance-assortativity.txt') ==>performance-assortativity.txt gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by('performances'). each{f.append(it.a + 't' + it.b + 'n')} "Are songs that follow each other assortative by the number of times they are played in concert?" https://en.wikipedia.org/wiki/Assortativity "Gremlins of a green, bulk together."
  • 120. gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by('performances') ==>[a:5, b:1] ==>[a:5, b:531] ==>[a:5, b:394] ==>[a:5, b:293] ==>[a:5, b:1] ==>[a:1, b:473] ==>[a:1, b:24] ==>[a:531, b:87] ==>[a:531, b:41] ==>[a:531, b:254] ... gremlin> f = new File('performance-assortativity.txt') ==>performance-assortativity.txt gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by('performances'). each{f.append(it.a + 't' + it.b + 'n')} > cor.test(t[,1],t[,2], test='pearson') Pearson's product-moment correlation data: t[, 1] and t[, 2] t = -3.9525, df = 7045, p-value = 7.809e-05 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.07030966 -0.02371587 sample estimates: cor -0.04703835 No correlation betw een songs by perform ance. "…you know: birds and feathers and flocking togethers."
  • 121. gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by('songType') ==>[a:cover, b:cover] ==>[a:cover, b:cover] ==>[a:cover, b:original] ==>[a:cover, b:cover] ==>[a:cover, b:cover] ==>[a:cover, b:original] ==>[a:cover, b:original] ==>[a:cover, b:original] ==>[a:cover, b:cover] ==>[a:cover, b:cover] ... gremlin> f = new File('songType-assortativity.txt') ==>songType-assortativity.txt gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by('songType'). each{f.append((it.a == 'cover' ? 0 : 1) + 't' + (it.b == 'cover' ? 0 : 1) + 'n')} "Are songs that follow each other assortative by either being a cover or an original song?" "Hmmm.. need numbers -- ah! only two song types."
  • 122. gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by('songType') ==>[a:cover, b:cover] ==>[a:cover, b:cover] ==>[a:cover, b:original] ==>[a:cover, b:cover] ==>[a:cover, b:cover] ==>[a:cover, b:original] ==>[a:cover, b:original] ==>[a:cover, b:original] ==>[a:cover, b:cover] ==>[a:cover, b:cover] ... gremlin> f = new File('songType-assortativity.txt') ==>songType-assortativity.txt gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by('songType'). each{f.append((it.a == 'cover' ? 0 : 1) + 't' + (it.b == 'cover' ? 0 : 1) + 'n')} > cramersV(chisq.test(t[,1],t[,2])$observed) [1] 0.0093161 No correlation betw een songs by song type (original or cover). "Cramer's V is for nominal data correlations."
  • 123. gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by(out('sungBy').values('name')) ==>[a:Garcia, b:Spencer_Davis] ==>[a:Garcia, b:Weir] ==>[a:Garcia, b:Garcia] ==>[a:Garcia, b:Garcia] ==>[a:Garcia, b:Weir] ==>[a:Spencer_Davis, b:Weir] ==>[a:Spencer_Davis, b:Grateful_Dead] ==>[a:Weir, b:Garcia] ==>[a:Weir, b:All] ==>[a:Weir, b:Garcia] ... gremlin> f = new File('singer-assortativity.txt') ==>singer-assortativity.txt gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by(out('sungBy').values('name')). each{f.append(it.a.hashCode() + 't' + it.b.hashCode() + 'n')} "Are songs that follow each other assortative by their singer?" "Dah! Now I need an algorithm to turn a String to a unique integer. Duh -- hashCode()."
  • 124. gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by(out('sungBy').values('name')) ==>[a:Garcia, b:Spencer_Davis] ==>[a:Garcia, b:Weir] ==>[a:Garcia, b:Garcia] ==>[a:Garcia, b:Garcia] ==>[a:Garcia, b:Weir] ==>[a:Spencer_Davis, b:Weir] ==>[a:Spencer_Davis, b:Grateful_Dead] ==>[a:Weir, b:Garcia] ==>[a:Weir, b:All] ==>[a:Weir, b:Garcia] ... gremlin> f = new File('singer-assortativity.txt') ==>singer-assortativity.txt gremlin> g.V().as('a').out('followedBy').as('b'). select('a','b').by(out('sungBy').values('name')). each{f.append(it.a.hashCode() + 't' + it.b.hashCode() + 'n')} > cramersV(chisq.test(t[,1],t[,2])$observed) [1] 0.2354349 Songs are w eakly correlated by singers. "However, its typically just Jerry and Bobby trading off in ones or twos…"
  • 125. gremlin> g.V().hasLabel('song'). repeat(out('followedBy').groupCount('m').by('name')).times(8). cap('m'). order(local).by(valueDecr). limit(local,10) "What songs are most central in the concert network?" "Sortagettingboredworkingontheslides.Myimprov-vibedied… ohyea,butIgotanastyclassicriffIremember." "Not PaaaaaageRank again."
  • 126. gremlin> g.V().hasLabel('song'). repeat(out('followedBy').groupCount('m').by('name')).times(8). cap('m'). order(local).by(valueDecr). limit(local,10) ==>PLAYING IN THE BAND=34142246667508 ==>ME AND MY UNCLE=32094411419320 ==>JACK STRAW=31867238591868 ==>EL PASO=29973481580211 ==>TRUCKING=29819272116849 ==>PROMISED LAND=28663488257022 ==>CHINA CAT SUNFLOWER=28569992924918 ==>CUMBERLAND BLUES=26320323048221 ==>LOOKS LIKE RAIN=26138795229794 ==>RAMBLE ON ROSE=26059394903880 gremlin> "Rodriguez, M.A., Gintautas, V., Pepe, A., “A Grateful Dead Analysis: The Relationship Between Concert and Listening Behavior,” First Monday 14:1, ISSN:1396-0466, http://arxiv.org/abs/0807.2466, January 2009." "Huh, those are the tracks from the 'greatest' Grateful Dead's greatest hits." https://en.wikipedia.org/wiki/What_a_Long_Strange_Trip_It%27s_Been
  • 127. gremlin> g.V().hasLabel('song'). repeat(out('followedBy').groupCount('m').by('name')).times(8). cap('m'). order(local).by(valueDecr). limit(local,10) ==>PLAYING IN THE BAND=34142246667508 ==>ME AND MY UNCLE=32094411419320 ==>JACK STRAW=31867238591868 ==>EL PASO=29973481580211 ==>TRUCKING=29819272116849 ==>PROMISED LAND=28663488257022 ==>CHINA CAT SUNFLOWER=28569992924918 ==>CUMBERLAND BLUES=26320323048221 ==>LOOKS LIKE RAIN=26138795229794 ==>RAMBLE ON ROSE=26059394903880 gremlin> clock(10){g.V().hasLabel('song'). repeat(out('followedBy').groupCount('m').by('name')).times(8). cap('m'). order(local).by(valueDecr). limit(local,10)} ==>4040.761404 gremlin> "4 seconds to analyze that many paths -- that is the power of bulking." "34 trillion traversers passed through thisvertex."
  • 128. gremlin> graph = HadoopGraph.open('hadoop-grateful-gryo.properties') ==>hadoopgraph[gryoinputformat->gryooutputformat] gremlin> "And now note how the same Gremlin queries you've seen thus far can execute across a cluster."
  • 129. gremlin> graph = HadoopGraph.open('hadoop-grateful-gryo.properties') ==>hadoopgraph[gryoinputformat->gryooutputformat] gremlin> g = graph.traversal(computer(SparkGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer] gremlin> G raphCom puter's are O LAP system s "Lets use Apache Spark via TinkerPop's SparkGraphComputer."
  • 130. gremlin> graph = HadoopGraph.open('hadoop-grateful-gryo.properties') ==>hadoopgraph[gryoinputformat->gryooutputformat] gremlin> g = graph.traversal(computer(SparkGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer] gremlin> g.V().match( __.as('a').out('sungBy').as('b'), __.as('a').out('writtenBy').as('b'), __.as('a').values('name').as('c'), __.as('b').values('name').as('d')). select('c','d') "Which songs were written and sung by the same person?" "__" is required by G rem lin-G roovy as "as" is a keyword in G roovy.
  • 131. gremlin> graph = HadoopGraph.open('hadoop-grateful-gryo.properties') ==>hadoopgraph[gryoinputformat->gryooutputformat] gremlin> g = graph.traversal(computer(SparkGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer] gremlin> g.V().match( __.as('a').out('sungBy').as('b'), __.as('a').out('writtenBy').as('b'), __.as('a').values('name').as('c'), __.as('b').values('name').as('d')). select('c','d') ==>[c:ANY WONDER, d:Unknown] ==>[c:WALK DOWN THE STREET, d:Unknown] ==>[c:LEAVE YOUR LOVE AT HOME, d:Unknown] ==>[c:COWBOY SONG, d:Unknown] ==>[c:NEIGHBORHOOD GIRLS, d:Suzanne_Vega] ==>[c:MINDBENDER, d:Garcia_Lesh] ==>[c:EQUINOX, d:Lesh] ==>[c:NO LEFT TURN UNSTONED (CARDBOARD COWBOY), d:Lesh] ==>[c:CHILDHOODS END, d:Lesh] ==>[c:NEVER TRUST A WOMAN, d:Mydland] ... gremlin> "That was a distributed, declarative graph pattern match query. This could have been over a 1000 node cluster."
  • 132. g.V().match( __.as('a').out('sungBy').as('b'), __.as('a').out('writtenBy').as('b'), __.as('a').values('name').as('c'), __.as('b').values('name').as('d')). select('c','d') SELECT ?c ?d WHERE { ?a e:writtenBy ?b . ?a e:sungBy ?b . ?a v:name ?c . ?b v:name ?d } Gremlin-Groovy "Gremlin supports the 'match'-style found in most pattern match languages such as SPARQL."
  • 133. g.V().match( __.as('a').out('sungBy').as('b'), __.as('a').out('writtenBy').as('b'), __.as('a').values('name').as('c'), __.as('b').values('name').as('d')). select('c','d') SELECT ?c ?d WHERE { ?a e:writtenBy ?b . ?a e:sungBy ?b . ?a v:name ?c . ?b v:name ?d } Gremlin-Groovy "Two different graph languages can be compiled to the Gremlin traversal machine." Uses Apache Jena's SPARQ L parser HOMEWORK: Use Apache Calcite and convert SQL to a Gremlin traversal.SPARQL to Gremlin Compiler Ted Wilmes was here.
  • 134. gremlin> :install com.datastax sparql-gremlin 0.1 ==>Loaded: [com.datastax, sparql-gremlin, 0.1] gremlin> :plugin use datastax.sparql ==>datastax.sparql activated gremlin> "Install the SPARQL-Gremlin compiler developed by Daniel Kuppitz of DataStax."
  • 135. gremlin> :install com.datastax sparql-gremlin 0.1 ==>Loaded: [com.datastax, sparql-gremlin, 0.1] gremlin> :plugin use datastax.sparql ==>datastax.sparql activated gremlin> :remote connect datastax.sparql g ==>SPARQL[graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]] gremlin> "Given that SPARQL is NOT Groovy, it is passed to the cluster as a remote String."
  • 136. gremlin> :install com.datastax sparql-gremlin 0.1 ==>Loaded: [com.datastax, sparql-gremlin, 0.1] gremlin> :plugin use datastax.sparql ==>datastax.sparql activated gremlin> :remote connect datastax.sparql g ==>SPARQL[graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]] gremlin> :> SELECT ?c ?d WHERE { ?a e:writtenBy ?b . ?a e:sungBy ?b . ?a v:name ?c . ?b v:name ?d } ==>[c:ANY WONDER, d:Unknown] ==>[c:WALK DOWN THE STREET, d:Unknown] ==>[c:LEAVE YOUR LOVE AT HOME, d:Unknown] ==>[c:COWBOY SONG, d:Unknown] ==>[c:NEIGHBORHOOD GIRLS, d:Suzanne_Vega] ==>[c:MINDBENDER, d:Garcia_Lesh] ==>[c:EQUINOX, d:Lesh] ==>[c:NO LEFT TURN UNSTONED (CARDBOARD COWBOY), d:Lesh] ==>[c:CHILDHOODS END, d:Lesh] ==>[c:NEVER TRUST A WOMAN, d:Mydland] ... gremlin> "SPARQL was just executed over Apache Spark by way of the Gremlin traversal machine."
  • 137. gremlin> g = graph.traversal(computer(GiraphGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], giraphgraphcomputer] gremlin> "Use a different GraphComputer."
  • 138. gremlin> g = graph.traversal(computer(GiraphGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], giraphgraphcomputer] gremlin> g.V().match( __.as('a').out('sungBy').as('b'), __.as('a').out('writtenBy').as('b'), __.as('a').values('name').as('c'), __.as('b').values('name').as('d')). select('c','d') INFO org.apache.hadoop.mapreduce.Job - Running job: job_1445539638801_0011 INFO org.apache.hadoop.mapreduce.Job - map 50% reduce 0% INFO org.apache.hadoop.mapreduce.Job - map 100% reduce 0% INFO org.apache.hadoop.mapreduce.Job - Counters: 55 ... Giraph Stats Superstep 1 GiraphComputation (ms)=2022 Superstep 2 GiraphComputation (ms)=1994 Superstep 3 GiraphComputation (ms)=999 Superstep 4 GiraphComputation (ms)=1001 Superstep 5 GiraphComputation (ms)=1254 Total (ms)=21425 ... ==>[c:IT MUST HAVE BEEN THE ROSES, d:Hunter] ==>[c:EASY WIND, d:Hunter] ==>[c:WHATLL YOU RAISE, d:Hunter] ==>[c:CRYPTICAL ENVELOPMENT, d:Garcia] ==>[c:CREAM PUFF WAR, d:Garcia] ==>[c:DRUMS, d:Grateful_Dead] ... gremlin> "Gremlin-Groovy just executed over Apache Giraph by way of the Gremlin traversal machine." "I'm a Gingerbremlin."
  • 139. Gremlin-Java8 SPARQL Cypher GraphQL G rem lin-Scala O rientDB Neo4j Titan Spark Giraph Hadoop Sqlg IBM BlueMix Any Graph System Any Graph Language Stardog Ripple SQ L * Many system providers are still on TinkerPop2 and thus, haven't migrated to TinkerPop3. (JO INs are walks!) (easy-parser, its JSON!) Works over PostgreSQL and MySQL Pieter Martin works on this. CassandraandHBase RED = "No known compiler." RDFstore
  • 140. Thank you…. Rodriguez, M.A., Kuppitz, D., Yim, K., “Tales From the TinkerPop," DataStax Engineering Blog, 2015. http://www.datastax.com/dev/blog/tales-from-the-tinkerpop