Human Factors of XR: Using Human Factors to Design XR Systems
Graph x pregel
1. Spark GraphX & Pregel
Challenges and Best Practices
Ashutosh Trivedi (IIIT Bangalore)
Kaushik Ranjan (IIIT Bangalore)
Sigmoid-Meetup Bangalore
https://github.com/anantasty/SparkAlgorithms
2. Agenda
• Introduction to GraphX
– How to describe a graph
– RDDs to store Graph
– Algorithms available
• Application in graph algorithms
– Feedback Vertex Set of a Graph
– Identifying parallel parts of the solution.
• Challenges we faced
• Best practices
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
2
3. Graph Representation
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
3
class Graph [ V, E ] {
def Graph(vertices: Table[ (Id, V) ],
edges: Table[ (Id, Id, E) ])
• The VertexRDD[A] extends RDD[(VertexID, A)] and adds the additional
constraint that each VertexID occurs only once.
• Moreover, VertexRDD[A] represents a set of vertices each with an
attribute of type A
• The EdgeRDD[ED], extends RDD[Edge[ED]]
6. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
6
A BA
Vertex and Edges
Vertex Edge
7. Triplets Join Vertices and Edges
• The triplets operator joins vertices and edges:
TripletsVertices
B
A
C
D
Edges
A B
A C
B C
C D
A BA
B A C
B C
C D
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
7
10. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
10
Feedback Vertex Set
• A feedback vertex set of a graph is a set of vertices whose removal
leaves a graph without cycles.
• Each feedback vertex set contains at least one vertex of any cycle in the
graph.
• The feedback vertex set problem is an NP-complete problem
in computational complexity theory
• Enumerate each simple cycle.
11. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
11
1 2
34
5
6
7
8
9
10
Strongly Connected Components
Each strongly connected component can be considered in
parallel since they do not share any cycle
SC1 – (1) SC2 – (5) SC3 – (8) SC4 – (9)
12. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
12
FVS Algorithm
#Greedy recursive solution
FVS(G)
sccGraph = scc(G)
For each graph in sccGraph
For each vertex
remove vertex and again calculate scc,
vertexV = vertex which give max number of scc #which means it
kills maximum cycles
subGraph = subgraph(removeV )
FVS (subGraph )
15. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
15
FVS – Spark Implementation
sccGraph has one more property sccID on each vertices, extract it
16. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
16
FVS – Spark Implementation
sccGraph = scc(G)
For each graph in sccGraph
18. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
18
FVS – Spark Implementation
For each vertex
remove vertex and again calculate scc,
# Z is a list of scc count after removing each vertex
19. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
19
vertexV = vertex which give max number of scc #which means it
kills maximum cycles
FVS – Spark Implementation
21. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
21
Pregel
• Graph DB
– Data Storage
– Data Mining
• Advantages
– Large-scale distributed computations
– Parallel-algorithms for graphs on multiple machines
– Fault tolerance and distributability
22. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
22
Oldest Follower
What is the age of oldest follower of each user ?
Val oldestFollowerAge = graph
.aggregateMessages(
#map word => (word.dst.id, word.src.age),
#reduce (a,b) => max(a, b)
)
.vertices
mapReduceTriplets is now aggregateMessages
23. Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
23
In aggregateMessages :
• EdgeContext which exposes the triplet fields .
• functions to explicitly send messages to the source and
destination vertex.
• It require the user to indicate what fields in the triplet are
actually required.
New in GraphX
24. Theory – it’s Good
How it works – that’s awesome
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
24
Graph’s are recursive data-structures, where the
property of a vertex is dependent on the properties of
it’s neighbors, which in turn are dependent on the
properties of their neighbors.
28. Applications - GIS
• Algorithm – to compute all vertices in a directed graph, that can
reach out to a given vertex.
• Can be used for watershed delineation in Geographic Information
Systems
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
28
Vertices that can reach out to E are A and B