SlideShare a Scribd company logo
1 of 31
Spark GraphX & Pregel
Challenges and Best Practices
Ashutosh Trivedi (IIIT Bangalore)
Kaushik Ranjan (IIIT Bangalore)
Sigmoid-Meetup Bangalore
https://github.com/anantasty/SparkAlgorithms
Agenda
• Introduction to GraphX
– How to describe a graph
– RDDs to store Graph
– Algorithms available
• Application in graph algorithms
– Feedback Vertex Set of a Graph
– Identifying parallel parts of the solution.
• Challenges we faced
• Best practices
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
2
Graph Representation
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
3
class Graph [ V, E ] {
def Graph(vertices: Table[ (Id, V) ],
edges: Table[ (Id, Id, E) ])
• The VertexRDD[A] extends RDD[(VertexID, A)] and adds the additional
constraint that each VertexID occurs only once.
• Moreover, VertexRDD[A] represents a set of vertices each with an
attribute of type A
• The EdgeRDD[ED], extends RDD[Edge[ED]]
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
4
GraphX - Representation
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
5
GraphX - Representation
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
6
A BA
Vertex and Edges
Vertex Edge
Triplets Join Vertices and Edges
• The triplets operator joins vertices and edges:
TripletsVertices
B
A
C
D
Edges
A B
A C
B C
C D
A BA
B A C
B C
C D
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
7
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
8
Triplets elements
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
9
Subgraphs
Predicates vpred and epred
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
10
Feedback Vertex Set
• A feedback vertex set of a graph is a set of vertices whose removal
leaves a graph without cycles.
• Each feedback vertex set contains at least one vertex of any cycle in the
graph.
• The feedback vertex set problem is an NP-complete problem
in computational complexity theory
• Enumerate each simple cycle.
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
11
1 2
34
5
6
7
8
9
10
Strongly Connected Components
Each strongly connected component can be considered in
parallel since they do not share any cycle
SC1 – (1) SC2 – (5) SC3 – (8) SC4 – (9)
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
12
FVS Algorithm
#Greedy recursive solution
FVS(G)
sccGraph = scc(G)
For each graph in sccGraph
For each vertex
remove vertex and again calculate scc,
vertexV = vertex which give max number of scc #which means it
kills maximum cycles
subGraph = subgraph(removeV )
FVS (subGraph )
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
13
1 2
4 3
2
4 3
Graph Iteration SCC count
3
1
4 3
1
1 2
4
3
1 2
4 3
1 2
4 3
Remove 2
Remove 1
Remove 3
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
14
1
5
8 9
1 5 8 9Feedback Vertex Set
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
15
FVS – Spark Implementation
sccGraph has one more property sccID on each vertices, extract it
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
16
FVS – Spark Implementation
sccGraph = scc(G)
For each graph in sccGraph
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
17
FVS – Spark Implementation
#Greedy recursive function
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
18
FVS – Spark Implementation
For each vertex
remove vertex and again calculate scc,
# Z is a list of scc count after removing each vertex
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
19
vertexV = vertex which give max number of scc #which means it
kills maximum cycles
FVS – Spark Implementation
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
20
subGraph = subgraph(removeV )
FVS (subGraph )
FVS – Spark Implementation
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
21
Pregel
• Graph DB
– Data Storage
– Data Mining
• Advantages
– Large-scale distributed computations
– Parallel-algorithms for graphs on multiple machines
– Fault tolerance and distributability
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
22
Oldest Follower
What is the age of oldest follower of each user ?
Val oldestFollowerAge = graph
.aggregateMessages(
#map word => (word.dst.id, word.src.age),
#reduce (a,b) => max(a, b)
)
.vertices
mapReduceTriplets is now aggregateMessages
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
23
In aggregateMessages :
• EdgeContext which exposes the triplet fields .
• functions to explicitly send messages to the source and
destination vertex.
• It require the user to indicate what fields in the triplet are
actually required.
New in GraphX
Theory – it’s Good
How it works – that’s awesome
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
24
Graph’s are recursive data-structures, where the
property of a vertex is dependent on the properties of
it’s neighbors, which in turn are dependent on the
properties of their neighbors.
Graph.Pregel ( initialMessage ) (
#message consumption
( vertexID, initialProperty, message ) → compute new property
,
#message generation
triplet → .. code ..
Iterator( vertexID, message )
Iterator.empty
,
#message aggregation
( existing message set, new message ) → NEW message set
)
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
25
Architecture
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
26
1 2
4 3
1030
30 20
1 2
4 3
10
30
30 20
max [30,10,20]
max [20] max [10]
1 2
4 3
100
10 10
1 2
4 3
10
0
10 10
max [10] max [10]
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
27
Example - output
1 2
4 3
100
0 0
Applications - GIS
• Algorithm – to compute all vertices in a directed graph, that can
reach out to a given vertex.
• Can be used for watershed delineation in Geographic Information
Systems
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
28
Vertices that can reach out to E are A and B
Algorithm
Graph.Pregel( Seq[vertexID’s] ) (
#message consumption
if vertex.state == 1
vertex.state → 2
else if vertex.state == 0
if ( vertex.adjacentVertices ∩ Seq[ vertexID’s ] ) isNotEmpty
vertex.state → 2
#message aggregator
Seq[existing vertex ID’s] U Seq[new vertex ID]
)
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
29
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
30
#message generation
for each triplet
if destinationVertex.state == 1
message( sourceVertexID, Seq[destinationVertexID] )
message( destinationVertexID, Seq[destinationVertexID] )
else if sourceVertex.state == 1 and destinationVertex.state == 2
message( sourceVertexID, Seq[destinationVertexID] )
else message( empty )
Algorithm
References
• Fork our repository at
• https://github.com/anantasty/SparkAlgorithms
• Follow us at
• https://github.com/codeAshu
• https://github.com/kaushikranjan
• https://spark.apache.org/docs/latest/graphx-programming-guide.html
Ashutosh & Kaushik, Sigmoid-Meetup
Bangalore Dec-2014
31

More Related Content

What's hot

Extending Gremlin with Foundational Steps
Extending Gremlin with Foundational StepsExtending Gremlin with Foundational Steps
Extending Gremlin with Foundational StepsStephen Mallette
 
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahonGraph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahonChristopher Conlan
 
Algorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsAlgorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsChristopher Conlan
 
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...Neo4j
 
Gremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphGremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphStephen Mallette
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryMarko Rodriguez
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsopenCypher
 
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks Ryan Rossi
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbaiUnmesh Baile
 
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016Penn State University
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenEdureka!
 
8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with RFAO
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with AzureBarbara Fusinska
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 
Digital image processing using matlab (fundamentals)
Digital image processing using matlab (fundamentals)Digital image processing using matlab (fundamentals)
Digital image processing using matlab (fundamentals)Taimur Adil
 

What's hot (20)

Extending Gremlin with Foundational Steps
Extending Gremlin with Foundational StepsExtending Gremlin with Foundational Steps
Extending Gremlin with Foundational Steps
 
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahonGraph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
 
Algorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsAlgorithms 101 for Data Scientists
Algorithms 101 for Data Scientists
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
 
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
 
Gremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphGremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise Graph
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 
Fluent14
Fluent14Fluent14
Fluent14
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable Views
 
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
 
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
Priority queues
Priority queuesPriority queues
Priority queues
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
 
8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with R
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with Azure
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Digital image processing using matlab (fundamentals)
Digital image processing using matlab (fundamentals)Digital image processing using matlab (fundamentals)
Digital image processing using matlab (fundamentals)
 
Power of Polyglot Search
Power of Polyglot SearchPower of Polyglot Search
Power of Polyglot Search
 

Similar to Graph x pregel

Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterStockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterACSG Section Montréal
 
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Roger Huang
 
Learn basics of Clojure/script and Reagent
Learn basics of Clojure/script and ReagentLearn basics of Clojure/script and Reagent
Learn basics of Clojure/script and ReagentMaty Fedak
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud confamarsri
 
Roadmap y Novedades de producto
Roadmap y Novedades de productoRoadmap y Novedades de producto
Roadmap y Novedades de productoNeo4j
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGMapR Technologies
 
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...Thejaka Amila Kanewala, Ph.D.
 
2014.06.24.what is ubix
2014.06.24.what is ubix2014.06.24.what is ubix
2014.06.24.what is ubixJim Cooley
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezMapR Technologies
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetupamarsri
 
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021Peng Cheng
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)Daniel Nüst
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkDatabricks
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengDatabricks
 
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial WorldGIS in the Rockies
 

Similar to Graph x pregel (20)

Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterStockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
 
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
 
Scala 20140715
Scala 20140715Scala 20140715
Scala 20140715
 
Learn basics of Clojure/script and Reagent
Learn basics of Clojure/script and ReagentLearn basics of Clojure/script and Reagent
Learn basics of Clojure/script and Reagent
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud conf
 
Roadmap y Novedades de producto
Roadmap y Novedades de productoRoadmap y Novedades de producto
Roadmap y Novedades de producto
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUG
 
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
 
2014.06.24.what is ubix
2014.06.24.what is ubix2014.06.24.what is ubix
2014.06.24.what is ubix
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
 
Spark algorithms
Spark algorithmsSpark algorithms
Spark algorithms
 

More from Sigmoid

Monitoring and tuning Spark applications
Monitoring and tuning Spark applicationsMonitoring and tuning Spark applications
Monitoring and tuning Spark applicationsSigmoid
 
Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1Sigmoid
 
Real-Time Stock Market Analysis using Spark Streaming
 Real-Time Stock Market Analysis using Spark Streaming Real-Time Stock Market Analysis using Spark Streaming
Real-Time Stock Market Analysis using Spark StreamingSigmoid
 
Levelling up in Akka
Levelling up in AkkaLevelling up in Akka
Levelling up in AkkaSigmoid
 
Expression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutionsExpression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutionsSigmoid
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Sigmoid
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0Sigmoid
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesSigmoid
 
Joining Large data at Scale
Joining Large data at ScaleJoining Large data at Scale
Joining Large data at ScaleSigmoid
 
Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...Sigmoid
 
Failsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they workFailsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they workSigmoid
 
WEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERSWEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERSSigmoid
 
Angular js performance improvements
Angular js performance improvementsAngular js performance improvements
Angular js performance improvementsSigmoid
 
Building high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesosBuilding high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesosSigmoid
 
Equation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-sparkEquation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-sparkSigmoid
 
Introduction to apache nutch
Introduction to apache nutchIntroduction to apache nutch
Introduction to apache nutchSigmoid
 
Approaches to text analysis
Approaches to text analysisApproaches to text analysis
Approaches to text analysisSigmoid
 
Graph computation
Graph computationGraph computation
Graph computationSigmoid
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analyticsSigmoid
 

More from Sigmoid (20)

Monitoring and tuning Spark applications
Monitoring and tuning Spark applicationsMonitoring and tuning Spark applications
Monitoring and tuning Spark applications
 
Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1
 
Real-Time Stock Market Analysis using Spark Streaming
 Real-Time Stock Market Analysis using Spark Streaming Real-Time Stock Market Analysis using Spark Streaming
Real-Time Stock Market Analysis using Spark Streaming
 
Levelling up in Akka
Levelling up in AkkaLevelling up in Akka
Levelling up in Akka
 
Expression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutionsExpression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutions
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Joining Large data at Scale
Joining Large data at ScaleJoining Large data at Scale
Joining Large data at Scale
 
Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...
 
Failsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they workFailsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they work
 
WEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERSWEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERS
 
Angular js performance improvements
Angular js performance improvementsAngular js performance improvements
Angular js performance improvements
 
Building high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesosBuilding high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesos
 
Equation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-sparkEquation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-spark
 
Introduction to apache nutch
Introduction to apache nutchIntroduction to apache nutch
Introduction to apache nutch
 
Approaches to text analysis
Approaches to text analysisApproaches to text analysis
Approaches to text analysis
 
Graph computation
Graph computationGraph computation
Graph computation
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Graph x pregel

  • 1. Spark GraphX & Pregel Challenges and Best Practices Ashutosh Trivedi (IIIT Bangalore) Kaushik Ranjan (IIIT Bangalore) Sigmoid-Meetup Bangalore https://github.com/anantasty/SparkAlgorithms
  • 2. Agenda • Introduction to GraphX – How to describe a graph – RDDs to store Graph – Algorithms available • Application in graph algorithms – Feedback Vertex Set of a Graph – Identifying parallel parts of the solution. • Challenges we faced • Best practices Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 2
  • 3. Graph Representation Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 3 class Graph [ V, E ] { def Graph(vertices: Table[ (Id, V) ], edges: Table[ (Id, Id, E) ]) • The VertexRDD[A] extends RDD[(VertexID, A)] and adds the additional constraint that each VertexID occurs only once. • Moreover, VertexRDD[A] represents a set of vertices each with an attribute of type A • The EdgeRDD[ED], extends RDD[Edge[ED]]
  • 4. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 4 GraphX - Representation
  • 5. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 5 GraphX - Representation
  • 6. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 6 A BA Vertex and Edges Vertex Edge
  • 7. Triplets Join Vertices and Edges • The triplets operator joins vertices and edges: TripletsVertices B A C D Edges A B A C B C C D A BA B A C B C C D Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 7
  • 8. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 8 Triplets elements
  • 9. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 9 Subgraphs Predicates vpred and epred
  • 10. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 10 Feedback Vertex Set • A feedback vertex set of a graph is a set of vertices whose removal leaves a graph without cycles. • Each feedback vertex set contains at least one vertex of any cycle in the graph. • The feedback vertex set problem is an NP-complete problem in computational complexity theory • Enumerate each simple cycle.
  • 11. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 11 1 2 34 5 6 7 8 9 10 Strongly Connected Components Each strongly connected component can be considered in parallel since they do not share any cycle SC1 – (1) SC2 – (5) SC3 – (8) SC4 – (9)
  • 12. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 12 FVS Algorithm #Greedy recursive solution FVS(G) sccGraph = scc(G) For each graph in sccGraph For each vertex remove vertex and again calculate scc, vertexV = vertex which give max number of scc #which means it kills maximum cycles subGraph = subgraph(removeV ) FVS (subGraph )
  • 13. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 13 1 2 4 3 2 4 3 Graph Iteration SCC count 3 1 4 3 1 1 2 4 3 1 2 4 3 1 2 4 3 Remove 2 Remove 1 Remove 3
  • 14. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 14 1 5 8 9 1 5 8 9Feedback Vertex Set
  • 15. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 15 FVS – Spark Implementation sccGraph has one more property sccID on each vertices, extract it
  • 16. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 16 FVS – Spark Implementation sccGraph = scc(G) For each graph in sccGraph
  • 17. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 17 FVS – Spark Implementation #Greedy recursive function
  • 18. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 18 FVS – Spark Implementation For each vertex remove vertex and again calculate scc, # Z is a list of scc count after removing each vertex
  • 19. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 19 vertexV = vertex which give max number of scc #which means it kills maximum cycles FVS – Spark Implementation
  • 20. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 20 subGraph = subgraph(removeV ) FVS (subGraph ) FVS – Spark Implementation
  • 21. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 21 Pregel • Graph DB – Data Storage – Data Mining • Advantages – Large-scale distributed computations – Parallel-algorithms for graphs on multiple machines – Fault tolerance and distributability
  • 22. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 22 Oldest Follower What is the age of oldest follower of each user ? Val oldestFollowerAge = graph .aggregateMessages( #map word => (word.dst.id, word.src.age), #reduce (a,b) => max(a, b) ) .vertices mapReduceTriplets is now aggregateMessages
  • 23. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 23 In aggregateMessages : • EdgeContext which exposes the triplet fields . • functions to explicitly send messages to the source and destination vertex. • It require the user to indicate what fields in the triplet are actually required. New in GraphX
  • 24. Theory – it’s Good How it works – that’s awesome Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 24 Graph’s are recursive data-structures, where the property of a vertex is dependent on the properties of it’s neighbors, which in turn are dependent on the properties of their neighbors.
  • 25. Graph.Pregel ( initialMessage ) ( #message consumption ( vertexID, initialProperty, message ) → compute new property , #message generation triplet → .. code .. Iterator( vertexID, message ) Iterator.empty , #message aggregation ( existing message set, new message ) → NEW message set ) Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 25 Architecture
  • 26. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 26 1 2 4 3 1030 30 20 1 2 4 3 10 30 30 20 max [30,10,20] max [20] max [10] 1 2 4 3 100 10 10 1 2 4 3 10 0 10 10 max [10] max [10]
  • 27. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 27 Example - output 1 2 4 3 100 0 0
  • 28. Applications - GIS • Algorithm – to compute all vertices in a directed graph, that can reach out to a given vertex. • Can be used for watershed delineation in Geographic Information Systems Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 28 Vertices that can reach out to E are A and B
  • 29. Algorithm Graph.Pregel( Seq[vertexID’s] ) ( #message consumption if vertex.state == 1 vertex.state → 2 else if vertex.state == 0 if ( vertex.adjacentVertices ∩ Seq[ vertexID’s ] ) isNotEmpty vertex.state → 2 #message aggregator Seq[existing vertex ID’s] U Seq[new vertex ID] ) Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 29
  • 30. Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 30 #message generation for each triplet if destinationVertex.state == 1 message( sourceVertexID, Seq[destinationVertexID] ) message( destinationVertexID, Seq[destinationVertexID] ) else if sourceVertex.state == 1 and destinationVertex.state == 2 message( sourceVertexID, Seq[destinationVertexID] ) else message( empty ) Algorithm
  • 31. References • Fork our repository at • https://github.com/anantasty/SparkAlgorithms • Follow us at • https://github.com/codeAshu • https://github.com/kaushikranjan • https://spark.apache.org/docs/latest/graphx-programming-guide.html Ashutosh & Kaushik, Sigmoid-Meetup Bangalore Dec-2014 31