Faunus: Graph Analytics Engine

FAUNUS
MARKO A. RODRIGUEZ
http://THINKAURELIUS.COM
GRAPH ANALYTICS ENGINE

Faunus is a graph analytics engine built atop the Hadoop
distributed computing platform. The graph representation is
a distributed adjacency list, whereby a vertex and its
incident edges are co-located on the same machine.
Querying a Faunus graph is possible with a MapReduce-
variant of the Gremlin graph traversal language. A Gremlin
expression compiles down to a series of MapReduce-steps
that are sequence optimized and then executed by Hadoop.
Results are stored as transformations to the input graph
(graph derivations) or computational side-effects such as
aggregates (graph statistics). Beyond querying, a collection
of input/output formats are supported which enable Faunus
to load/store graphs in the distributed graph database Titan,
various graph formats stored in HDFS, and via arbitrary
user-deﬁned functions. This presentation will focus primarily
on Faunus, but will also review the satellite technologies
that enable it.
ABSTRACT
http://FAUNUS.THINKAURELIUS.COM

SPONSORED BY
ECCO, the Evolution, Complexity and Cognition group, is a multidisciplinary
research group, directed by Francis Heylighen. They are localized at the
Vrije Universiteit Brussel (VUB), although members are distributed across
four continents. Researchers come from a wide variety of backgrounds,
from physical science and technology to the social sciences and humanities.
The philosophy is intrinsically transdisciplinary, transcending the traditional
boundaries between "hard" and "soft" sciences, and between philosophical
foundations and practical applications.
The Big-Data Interest Group (BIGDIG) is a focus group at LANL meeting
monthly to explore big-data methods and architectures. One goal of the
group is to identify early adopters and learn from their experiences.
Furthermore, they would like involve scientists that are looking for big-
data solutions and foster collaboration with those who might provide the
needed technology. The BIGDIG group includes members from all
domains: science, security, sensing, computing, library, and more.
The EgoSystem project is creating an integrated social model of the Los Alamos National Laboratory
and its surroundings using numerous online services such as Twitter, LinkedIn, MS Academic,
Wikipedia, and more. The model is seeded with LANL PostDocs, their created artifacts and
continuously grows to encompass their relations to other people and institutions. EgoSystem is a
Director sponsored project engineered by the Digital Library Research and Prototyping Team using
Big Graph Data technology provided by Aurelius.

0
name:faunus
born:2012
PROPERTIES

0
name:faunus
born:2012
EDGE
1
name:hadoop
born:2005

0
name:faunus
born:2012
ID
1
name:hadoop
born:2005
5

0
name:faunus
born:2012
LABEL
1
name:hadoop
born:2005
dependsOn
5

0
name:faunus
born:2012
PROPERTIES
1
name:hadoop
born:2005
dependsOn
since:2012
5

0
1
2
3
A
B
A
C
4
5
6
7
EDGE LABELS

0
1
2
3
A
B
A
C
a:b
c:d
e:f
g:h
i:j
4
5
6
7
ELEMENT
PROPERTIES

0
1
2
3
A
B
A
C
a:b
c:d
e:f
g:h
i:j
4
5
6
7
1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3

0
1
2
3
A
B
A
C
a:b
c:d
e:f
g:h
i:j
4
5
6
7
id props id props label id id props label idid label id id label id
1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3

0
1
2
3
A
B
A
C
a:b
c:d
e:f
g:h
i:j
4
5
6
7
id props
vertex
id props label id
edge
id props label id
edge
id label id
edge
id label id
edge
1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3

0
1
2
3
A
B
A
C
a:b
c:d
e:f
g:h
i:j
4
5
6
7
1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3
id props
vertex
id props label id
edge
id props label id
edge
id label id
edge
id label id
edge
incoming edges outgoing edges

0
1
3
4
5
6
7
8
9
10
11
AN ADJACENCY LIST

127.0.0.2 127.0.0.3 127.0.0.4
AN ADJACENCY LIST
+
CLUSTER
0
1
3
4
5
6
7
8
9
10
11

0
1
2
3
4
5
6
7
8
9
10
11
A DISTRIBUTED ADJACENCY LIST
127.0.0.2 127.0.0.3 127.0.0.4

Hadoop is a distributed computing platform composed of two key components:
HDFS:
A distributed ﬁle system that stores arbitrarily large ﬁles within a cluster.
MapReduce:
A parallel functional computing model for key/value pair data.
HADOOP
http://hadoop.apache.org

0
1
2
3
4
5
6
7
8
9
10
11
Structure
Process
Faunus provides graph input/output formats (structure)
and a traversal language for graphs (process).
FAUNUS AND HADOOP
127.0.0.2 127.0.0.3 127.0.0.4

1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
brother name:neptune
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
time:1 time:2 time:12
GRAPH
OF THE GODS
* Toy graph distributed with Faunus.

faunus$
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
time:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4

faunus$ bin/gremlin.sh
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
time:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4
http://gremlin.tinkerpop.com

,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin>
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
time:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4

,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin> hdfs.ls()
gremlin>
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
time:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4

,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin> hdfs.ls()
gremlin> hdfs.copyFromLocal('graph-of-the-gods.json','graph-of-the-gods.json')
==>null
gremlin>
0
1
2
3
4
5
6
7
8
9
10
11
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
time:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4

,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin> hdfs.ls()
gremlin> hdfs.copyFromLocal('graph-of-the-gods.json','graph-of-the-gods.json')
==>null
gremlin> hdfs.ls()
==>rw-r--r-- marko supergroup 2028 graph-of-the-gods.json
gremlin>
0
1
2
3
4
5
6
7
8
9
10
11
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
time:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4

gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
gremlin> g.getConf('faunus')
==>faunus.graph.input.format
=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format
=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true
==>faunus.sideeffect.output.format
=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
0
1
2
3
4
5
6
7
8
9
10
11
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
time:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4

gremlin> g.V
13/05/07 12:07:09 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)
13/05/07 12:07:09 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:
MapSequence[com.thinkaurelius.faunus.mapreduce.transform.VerticesMap.Map]
13/05/07 12:07:09 INFO mapreduce.FaunusCompiler: Job data location: output/job-0
13/05/07 12:07:10 INFO input.FileInputFormat: Total input paths to process : 1
13/05/07 12:07:10 INFO mapred.JobClient: Running job: job_201304251105_0004
13/05/07 12:07:11 INFO mapred.JobClient: map 0% reduce 0%
...
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
0
1
2
3
4
5
6
7
8
9
10
11
127.0.0.2 127.0.0.3 127.0.0.4
1
1
1
1
1
1
1
1
1
1
1
1

gremlin> g.V.has('type','god')
MapSequence[com.thinkaurelius.faunus.mapreduce.transform.VerticesMap.Map,
com.thinkaurelius.faunus.mapreduce.filter.PropertyFilterMap.Map]
...
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
0
1
2
3
4
5
6
7
8
9
10
11
127.0.0.2 127.0.0.3 127.0.0.4
0
1
1
1
0
0
0
0
0
0
0
0

gremlin> g.V.has('type','god').in('father')
com.thinkaurelius.faunus.mapreduce.filter.PropertyFilterMap.Map,
com.thinkaurelius.faunus.mapreduce.transform.VerticesVerticesMapReduce.Map,
com.thinkaurelius.faunus.mapreduce.transform.VerticesVerticesMapReduce.Reduce]
...
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
0
1
2
3
4
5
6
7
8
9
10
11
127.0.0.2 127.0.0.3 127.0.0.4
0
0
0
0
0
0
0
1
0
0
0
0

gremlin> g.V.has('type','god').in('father').out('mother').name
...
==>alcmene
gremlin>
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
0
1
2
3
4
5
6
7
8
9
10
11
127.0.0.2 127.0.0.3 127.0.0.4
0
0
0
0
0
0
0
0
1
0
0
0

1
k1:v1
k2:v2 2 3 5
k1:v1
vertex edge
incoming edges
4
edge edge
outgoing edges
edge
TRAVERSAL DATA
1. A long counter denoting how many
traversers exist at the element.
-OR-
2. A list of lists denoting path history of
individual traversers at the element.
counter=
cheap
enum
erative
=
expensive
* Each element in a row
maintains traversal data as well.
k1:v1 k1:v1 k1:v1

gremlin> g.V.has('type','god').in('father').out('mother').path
13/05/07 14:37:59 WARN mapreduce.FaunusCompiler: Path calculations are enabled for
this Faunus job (space and time expensive)
...
==>[v[1], v[7], v[8]]
gremlin>
1
6
0
3
name:tartarus
type:location
name:pluto
type:god
lives
brother
name:jupiter
type:god 2
type:god
pet
11
name:cerberus
type:monster
lives
father
name:saturn
type:titan
brother
5
name:sea
type:location
lives
4
name:sky
type:location
lives
7
father
battled
name:hercules
type:demigod
10
name:hydra
type:monster
battled
9
name:nemean
type:monster
battled
8
name:alcmene
type:human
mother
0
1
2
3
4
5
6
7
8
9
10
11
127.0.0.2 127.0.0.3 127.0.0.4
[1,7,8]

GREMLIN
GRAPH TRAVERSAL LANGUAGE
TRANSFORM FILTER SIDE-EFFECT BRANCH
t : (V [ E) ! P(V [ E) f : (V [ E) ! (V [ E [ ;) s : (V [ E)/!(V [ E)
f1 f2 f3 · · · f4
transform{}
V
id
label
out
in
outE
inE
inV
map
order
...
filter{}
has
hasNot
[0..10]
random
simplePath
back
...
sideEffect{}
groupCount
groupBy
aggregate
table
store
linkIn
linkOut
count
...
loop
copySplit
fairMerge
exhaustMerge
...
Gremlin is a functional graph language where traversals are
deﬁned using function composition. A set of useful predeﬁned
functions are provided with the language and generic
lambdas/closures are possible for arbitrary mappings.
http://gremlin.tinkerpop.com

EXAMPLE TRAVERSALS
g.V.has('type','person').out('attends')
.has('type','academy').name.groupCount
g.V.out.out.out.simplePath.count()
"How many people attend each academy?"
g.V.sideEffect{it.degree = it.inE('friend').count()}
.degree.groupCount
"What is the in-degree distribution of the friendship subgraph?"
"How many 3-step acyclic paths exist in the graph?"
* The only memory structure is the graph,
thus all data must be in the graph.
g.V.as('x').out('father').out('father')
.linkIn('grandfather','x')
"Derive all implicit grandfather relations in the graph."
g.V.count()
"How many vertices are in the graph?"
* Mutates the graph.

hdfs://user/ubuntu/
output/job-0/
output/job-1/
output/job-2/ {
graph*
sideeffect*
g.V.out .out .count()
<NullWritable, FaunusVertex> <NullWritable, FaunusVertex>
<NullWritable, FaunusVertex> <LongWritable, Holder<FaunusElement>>
<LongWritable,
Iterable<Holder<FaunusElement>>>
<NullWritable, FaunusVertex>
MAP ONLY STEPS
(NO REDUCE NEEDED)
MAP/REDUCE STEPS
map
map
reduce
FAUNUS DATA FLOW
valuekey

GREMLIN IN MAP/REDUCE
map(null, vertex, context) {
key = context.getConf().get('provided.key')
value = context.getConf().get('provided.value')
if(!vertex.getProperty(key).equals(value)) {
vertex.clearPaths();
}
context.write(vertex);
}
FILTER
f : (V [ E) ! (V [ E [ ;)
g.V.has('type','god')
* Most ﬁlters are map-only steps.
If the predicate returns false,
then all the path metadata is cleared from the element.
f(v)
'type'
'god'

for(e : vertex.getEdges(OUT)) {
context.write(e.getVertex(IN).id, holder('p',vertex.pathsOnly()))
}
context.write(vertex.id, holder('v',vertex))
}
reduce(long, iterable<holder> holders, context) {
vertex = new FaunusVertex(long)
for(h : holders) {
if(h.getTag() == 'v'))
vertex.addAll(h.getVertex())
else
vertex.addPaths(h.getVertex())
}
context.write(null, vertex)
}
127.0.0.4
127.0.0.3
127.0.0.2
t : (V [ E) ! P(V [ E)
TRANSFORM
g.V.out
* Traversals implement a reduce-side join.

key = context.getConf().get('provided.key')
context.write('graph',null,vertex)
context.write('sideeffect',
vertex.getProperty(key),vertex.getPathCount())
}
reduce(object, iterable<long> longs, context) {
sum = 0
for(l : longs) { sum += l }
context.write('sideeffect',object,sum)
}
SIDE-EFFECT
s : (V [ E)/!(V [ E)
g.V.type.groupCount()
s(v)
'type'
* Leverages MultipleInputs/Outputs

STRUCTURING GRAPHS
WITH FAUNUS

INPUT/OUTPUT FORMATS
SequenceFileOutputFormat
A list of serialized vertex objects in a compressed binary format.
<NullWritable,FaunusVertex>
The intermediate data format between MapReduce jobs
within a Faunus pipeline.
Fastest available format for both reading and writing.
Compressed using variable-width and preﬁx encodings.
gremlin> g
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
gremlin> g.setGraphOutputFormat(SequenceFileOutputFormat)
==>null
gremlin> g
==>faunusgraph[graphsoninputformat->sequencefileoutputformat]
gremlin>
SequenceFileInputFormat

GraphSONOutputFormat
A verbose JSON-based text-format. Each vertex is a single JSON document.
Easy for developers to generate. Useful for testing and examples.
Limited to JSON supported datatypes for element property values.
{"name":"saturn","type":"titan","_id":0,"_inE":[{"_label":"father","_id":12,"_outV":1}]}
{"name":"jupiter","type":"god","_id":1,"_outE":[{"_label":"lives","_id":13,"_inV":4},
{"_label":"brother","_id":16,"_inV":3},{"_label":"brother","_id":14,"_inV":2},
{"_label":"father","_id":12,"_inV":0}],"_inE":[{"_label":"brother","_id":17,"_outV":3},
{"_label":"brother","_id":15,"_outV":2},{"_label":"father","_id":24,"_outV":7}]}
{"name":"neptune","type":"god","_id":2,"_outE":[{"_label":"lives","_id":20,"_inV":5},
{"_label":"brother","_id":19,"_inV":3},{"_label":"brother","_id":15,"_inV":1}],"_inE":
[{"_label":"brother","_id":18,"_outV":3},{"_label":"brother","_id":14,"_outV":1}]}
...
GraphSONInputFormat
* JSON speciﬁcation is available at http://json.org

faunus.graph.input.format=
com.thinkaurelius.faunus.formats.edgelist.rdf.RDFInputFormat
faunus.input.location=graph-example-1.ntriple
faunus.graph.input.rdf.format=n-triples
faunus.graph.input.rdf.as-properties=http://www.w3.org/1999/02/22-rdf-syntax-ns#type
faunus.graph.input.rdf.use-localname=true
faunus.graph.input.rdf.literal-as-property=true
RDFInputFormat
Maps popular RDF text formats to a property graph.
Conﬁgurations allow for different mappings of RDF to the property graph model.
Utilizes a MapReduce step to convert an edge-list into an adjacency list.
33^^xsd:intex:marko
foaf:age 0
uri:ex:marko
age:33
* RDF parsers provided by http://openrdf.org

RexsterInputFormat
Rexster
{
"results": {
"_type":"vertex",
"_id":1,
"name":"tiberius",
"age":29
},
"queryTime":0.123
}
HTTP REXPRO
http://.../vertices/1
g.v(1).out('mother')
.out('mother').name
==>aurelia
Rexster is a graph server that is accessed via:
REST and a Gremlin binary protocol.
Rexster supports any Blueprints-enabled graph database.
http://rexster.tinkerpop.com

A Gremlin script stored in HDFS (distributed cache) allows for an arbitrary parse.
def boolean read(FaunusVertex v, String line) {
parts = line.split(':');
v.reuse(Long.valueOf(parts[0]))
parts[1].split(',').each {
v.addEdge(OUT, 'linkedTo', Long.valueOf(it));
}
return true;
}
ScriptInputFormat
0:1,2,3,4
1:2,3
2:0,3,5,6
3:1,2
...
def void write(FaunusVertex vertex, DataOutput output) {
output.writeUTF(vertex.getId().toString() + ':');
Iterator<Edge> itty = vertex.getEdges(OUT).iterator()
while (itty.hasNext()) {
output.writeUTF(
itty.next().getVertex(IN).getId() + ',');
}
output.writeUTF('n');
}
ScriptOutputFormat
0:1,2,3,4
1:2,3
2:0,3,5,6
3:1,2
...

Adam Jacobs. 2009. The Pathologies of Big Data. Communications of the ACM 52, 8 (August 2009), 36-44.
doi:10.1145/1536616.1536632 http://doi.acm.org/10.1145/1536616.1536632

0
1
3
4
5
6
7
8
9
10
11
Serial Key/Value Data Structure Indexed Key/Indexed Value Data Structure
0
1
3
4
5
6
7
8
9
10
11
GLOBAL VS. LOCAL
GRAPH ANALYSIS

TITAN
DISTRIBUTED GRAPH DATABASE
Application Servers Reading/Writing Graph Data
Titan Cluster Processing Gremlin Traversals and Writes
The biggest known Titan/Cassandra cluster to date:
~120 billion edge graph stored in a 16 hi1.4xlarge machine cluster.
Ego-centric graph traversals are requested by 80 m1.large machines.
The cluster serves ~10,000 transactions a second w/ ~200ms return times.
http://titan.thinkaurelius.com
http://thinkaurelius.com/2013/05/13/educating-the-planet-with-pearson/

FAUNUS AND TITAN
SUPPORTED TITAN INPUT/OUTPUT FORMATS
TitanCassandraInputFormat
TitanCassandraOutputFormat
TitanHBaseInputFormat
TitanHBaseOutputFormat

FAUNUS AND TITAN
Faunus/HadoopTitan/Cassandra
INTRA-CLUSTER CONFIGURATION
Data is processed on the machine where it is located.
Limited network communication.

FAUNUS AND TITAN
INTER-CLUSTER CONFIGURATION
Graph data is ofﬂoaded to another cluster.
Repeated analysis does not interfere with production graph database.

Graph g
long counter = 0
def setup(args) {
g = TitanFactory.open('cassandra:localhost')
}
def map(vertex, args) {
g.v(vertex.id).as('x').out('father')
.out('father').linkIn('grandfather','x')
if(counter++ % 1000 == 0) g.commit()
}
FAUNUS AND TITAN
VERTEX-CENTRIC COMPUTING WITH GREMLIN
A Gremlin script is stored in HDFS (distributed cache).
Vertex long ids are pulled out of Titan (FaunusVertex with id only).
The Gremlin script is evaluated concurrently for every vertex long id.
Guaranteed co-location of Gremlin script JVM and Titan vertex.
* Provided by the Gremlin script()-step

CREDITS
PRESENTED BY
MARKO A. RODRIGUEZ
SUPPORTED BY
LOS ALAMOS NATIONAL LABORATORY
LANL RESEARCH LIBRARY
VRIJE UNIVERSITEIT BRUSSEL
MANY THANKS TO
MATTHIAS BRöCHELER
STEPHEN MALLETTE
PAVEL YASKEVICH
DAN LAROCQUE
AURELIUS COMMUNITY
TINKERPOP COMMUNITY
KETRINA YIM

Faunus: Graph Analytics Engine

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (19)

Similaire à Faunus: Graph Analytics Engine

Similaire à Faunus: Graph Analytics Engine (20)

Plus de Marko Rodriguez

Plus de Marko Rodriguez (20)

Dernier

Dernier (20)

Faunus: Graph Analytics Engine