Pregel

In Proceedings of the 2010 ACM SIGMOD International
Conference on Management of data (pp. 135-146). ACM

Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik,
James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski

Pregel: A System for Large-Scale
Graph Processing

Source: SIGMETRICS ’09 Tutorial – MapReduce: The Programming Model and Practice, by Jerry Zhao

2

Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiments

• Related Work
• Conclusion & Future Work
3

The Problem
• Many practical computing problems concern large
graphs.
Large graph data

Graph algorithms

Web graph
Transportation routes
Citation relationships
Social networks

PageRank
Shortest path
Connected components
Clustering techniques

• Efficient processing of large graphs is challenging:
Poor locality of memory access
Very little work per vertex
Changing degree of parallelism
Running over many machines makes the problem worse
4

Want to Process a Large Scale Graph? The Options:
1. Crafting a custom distributed infrastructure.
Substantial engineering effort.

2. Relying on an existing distributed platform: e.g.
Map Reduce.
Inefficient: Must store graph state in each state  too
much communication between stages.

3. Using a single-computer graph algorithm library.
Not scalable. 

4. Using an existing parallel graph system.
Not fault tolerance. 
5

Pregel
• Google, to overcome, these challenges came up with
Pregel.
Provides scalability
Fault-tolerance
Flexibility to express arbitrary algorithms

• The high level organization of Pregel programs is
inspired by Valiant’s Bulk Synchronous Parallel
model [45].

[45] Leslie G. Valiant, A Bridging Model for Parallel Computation. Comm. ACM 33(8), 1990

6

Bulk Synchronous Parallel
Input

All Vote
to Halt

Output

•
•
•
•

Series of iterations (supersteps) .
Each vertex V invokes a function in parallel.
Can read messages sent in previous superstep (S-1).
Can send messages, to be read at the next superstep
(S+1).
• Can modify state of outgoing edges.

7

Advantage? In Vertex-Centric Approach
• Users focus on a local action.
• Processing each item independently.
• Ensures that pregel programs are inherently free of
deadlocks and data races common in asynchronous
systems.

8

Outline
• Introduction

• Applications
• Experiments

• Related Work
9

Model of Computation
All Vote
to Halt

•
•
•
•

Outpu
t

A Directed Graph is given to Pregel.
It runs the computation at each vertex.
Until all nodes vote for halt.
Pregel gives you a directed graph back.

10

Vertex State Machine

• Algorithm termination is based on every vertex
voting to halt.
• In superstep 0, every vertex is in the active state.
• A vertex deactivates itself by voting to halt.
• It can be reactivated by receiving an (external)
message.
11

3

6

2

1
Blue Arrows
are messages.

6
3

6

2

1
6

6

6

6
2

6

6

6

6

6

Blue vertices
have voted
to halt.

Example: Finding the largest value in a graph

12

Outline
• Introduction

• Applications
• Experiments

• Related Work
13

The C++ API
• Subclassing the predefined Vertex class, and writes
a Compute method.
Compute() method: which will be executed at each active
vertex in every superstep.

• Can get/set vertex value.
GetValue() / MutableValue()

• Can get/set outgoing edges values.
GetOutEdgeIterator()

• Can send/receive messages.
SendMessageTo() / Compute()
14

The C++ API – Vertex Class
3 value types
Override this!

in msgs

Vertex
Edge
out msg

15

The C++ API
Message passing:
• No guaranteed message delivery order.
• Messages are delivered exactly once.
• Can send messages to any node.
• If dest_vertex doesn’t exist, user’s function is
called.
void SendMessageTo(const string& dest_vertex,
const MessageValue& message);

16

The C++ API
Combiners (not active by default):
• Sending a message to another vertex that exists on a
different machine has some overhead.
• User specifies a way to reduce many messages into
one value (ala Reduce in MR).
by overriding the Combine() method.
Must be commutative and associative.

• Exceedingly useful in certain contexts (e.g., 4x
speedup on shortest-path computation).

17

The C++ API
Aggregators:
• A mechanism for global communication, monitoring,
and data.
Each vertex can produce a value in a superstep S for the
Aggregator to use.
The Aggregated value is available to all the vertices in
superstep S+1.

• Aggregators can be used for statistics and for global
communication.
E.g., Sum applied to out-edge count of each vertex.
 generates the total number of edges in the graph and
communicate it to all the vertices.
18

The C++ API
Topology mutations:
• Some graph algorithms need to change the graph's
topology.
E.g. A clustering algorithm may need to replace a cluster
with a node

• Vertices can create / destroy vertices at will.
• Resolving conflicting requests:
Partial ordering:
E Remove,V Remove,V Add, E Add.
User-defined handlers:
You fix the conflicts on your own.
19

The C++ API
Input and output:
• It has Reader/Writer for common file formats:
Text file
Vertices in a relational DB
Rows in BigTable

• User can customize Reader/Writer for new
input/outputs.
Subclassing Reader/Writer classes.

20

Outline
• Introduction

• Applications
• Experiments

• Related Work
21

Implementation
• Pregel was designed for the Google cluster
architecture.
• Persistent data is stored as files on a distributed
storage system like GFS or BigTable.
• Temporary data is stored on local disk.
• Vertices are assigned to the machines based on their
vertex-ID ( hash(ID) ) so that it can easily be
understood that which node is where.

22

System Architecture
• Executable is copied to many machines.
• One machine becomes the Master.
Maintains worker.
Recovers faults of workers.
Provides Web-UI monitoring tool of job progress.

• Other machines become Workers.
Processes its task.
Communicates with the other workers.

23

Pregel Execution
1. User programs are copied on machines.
2. One machine becomes the master.
 Other computer can find the master using name service and
register themselves to it.
 The master determines how many partitions the graph have

3. The master assigns one or more partitions and a
portion of user input to each worker.
4. The workers run the compute function for active
vertices and send the messages asynchronously.
 There is one thread for each partition in each worker.
 When the superstep is finished workers tell the master how
many vertices will be active for next superstep.

24

Source: http://www.cnblogs.com/huangfox/archive/2013/01/03/2843103.html

25

Fault Tolerance
• Checkpointing
The master periodically instructs the workers to save the
state of their partitions to persistent storage.
 e.g., Vertex values, edge values, incoming messages.

• Failure detection
Using regular “ping” messages.

• Recovery
The master reassigns graph partitions to the currently
available workers.
The workers all reload their partition state from most
recent available checkpoint.
26

Outline
• Introduction

• Applications
• Experiments

• Related Work
27

Application – Page Rank
• A = A given page
• T1 …. Tn = Pages that point to page A (citations)
• d = Damping factor between 0 and 1 (usually kept as
0.85)
• C(T) = number of links going out of T
• PR(A) = the PageRank of page A

PR ( A)

PR (T1 )
(1 d ) d (
C (T1 )

PR (T2 )
........
C (T2 )

PR (Tn )
)
C (Tn )
28


Source: Wikipedia

29

Store and carry PageRank
class PageRankVertex
: public Vertex<double, void, double> {
public:
virtual void Compute(MessageIterator* msgs) {
if (superstep() >= 1) {
double sum = 0;
for (; !msgs->Done(); msgs->Next())
sum += msgs->Value();
*MutableValue() = 0.15 / NumVertices() + 0.85 * sum;
}
if (superstep() < 30) {
const int64 n = GetOutEdgeIterator().size();
SendMessageToAllNeighbors(GetValue() / n);
} else
VoteToHalt();
For convergence, either there is a limit on
}
the number of supersteps or aggregators
};
are used to detect convergence.
30

Application – Shortest Path
class ShortestPathVertex
a constant larger than
: public Vertex<int, int, int> {
any feasible distance
void Compute(MessageIterator* msgs) {
int mindist = IsSource(vertex_id()) ? 0 : INF;
In the 1st superstep, only
for (; !msgs->Done(); msgs->Next())
the source vertex will
mindist = min(mindist, msgs->Value());
update its value (from INF
if (mindist < GetValue()) {
to zero)
*MutableValue() = mindist;
OutEdgeIterator iter = GetOutEdgeIterator();
for (; !iter.Done(); iter.Next())
SendMessageTo(iter.Target(),mindist + iter.GetValue());
}
VoteToHalt();
}
};

31

Example: SSSP in Pregel

1
10

2

0

9

3

5

4

6

7

2

32


1
10
10

2

0

9

3

5

4

6

7
5
2

33


1

10
10

2

0

9

3

5

4

6

7

5

2

34


2

5

14

8

10

0

11

1

10

9

3

12

4

6

7

5

2

7

35


1

8

11

10

2

0

9

3

5

4

6

7

5

2

7

36


9

1

8

11

10

0

14

13

2

9

3

5

4

7

5

2

6

15

7

37


1

8

9

10

2

0

9

3

5

4

6

7

5

2

7

38


1

8

9

10

2

0

9

3

5

4

7

5

2

6

13

7

39


1

8

9

10

2

0

9

3

5

4

6

7

5

2

7

40

Outline
• Introduction

• Applications
• Experiments

• Related Work
41

Experiments
• 300 multicore commodity PCs used.
• Only running time is counted.
Checkpointing disabled.

• Measures scalability of Worker tasks.
• Measures scalability w.r.t. # of Vertices.
in binary trees and log-normal trees.

• Naïve single-source shortest paths (SSSP)
implementation.
The weight of all edges = 1
42

SSSP - 1 billion vertex binary tree:
# of Pregel workers varies from 50 to 800
174 s
16 times workers
↓
Speedup of 10

17.3 s

43

SSSP – binary trees:
varying graph sizes on 800 worker tasks

702 s

17.3 s

Graph with a low average
outdegree the runtime
Increases linearly in the
graph size.

44

SSSP – log-normal random graphs (mean outdegree = 127.1):
varying graph sizes on 800 worker tasks

The runtime
Increases linearly in
the graph size, too.

45

Outline
• Introduction

• Applications
• Experiments

• Related Work
46

Related Work
• MapReduce
Pregel is similar in concept to MapReduce, but with a
natural graph API and much more efficient support for
iterative computations over the graph.

• Bulk Synchronous Parallel model
the Oxford BSP Library[38], Green BSP library[21], BSPlib[26]
and Paderborn University BSP library.
 The scalability and fault-tolerance implementation has not been
evaluated beyond several dozen machines,
 and none of them provides a graph-specific API.

47

Related Work
• The closest matches to Pregel are:
Parallel Boost Graph Library[22],[23]
 Pregel provides fault-tolerance

CGMgraph[8]
 object-oriented programming style at some performance cost

• There have been few systems reporting
experimental results for graphs at the scale of
billions of vertices.

48

Outline
• Introduction

• Applications
• Experiments

• Related Work
49

Conclusion & Future Work
• Pregel is a scalable and fault-tolerant platform with
an API that is sufficiently flexible to express arbitrary
graph algorithms.
• Future work
Relaxing the synchronicity of the model.
 Not to wait for slower workers at inter-superstep barriers.

Assigning vertices to machines to minimize inter-machine
communication.
Caring dense graphs in which most vertices send messages
to most other vertices.
50

Comment
• No comparison with other systems.
• The user has to modify Pregel a lot in order to
personalize it to his/her needs.
• No failure detection is mentioned for the master,
making it a single point of failure.

51

Pregel

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Pregel

Similaire à Pregel (20)

Dernier

Dernier (20)

Pregel

Notes de l'éditeur