We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Anti-differentiating Approximation Algorithms: PageRank and MinCut
1. Anti-differentiating
approximation algorithms !
& new relationships between !
Page Rank, spectral, and localized flow
David F. Gleich!
Purdue University!
Joint work with
Michael Mahoney.
Supported by "
NSF CAREER
1149756-CCF, "
Simons Inst.
ICERM
David Gleich · Purdue
1
2. Anti-differentiating
approximation algorithms !
& new relationships between !
Page Rank, spectral, and localized flow
A new derivation of the PageRank vector for an
undirected graph based on Laplacians, cuts, or flows.
A new understanding of the “push” methods to
compute Personalized PageRank
An empirical improvement to methods for semi-
supervised learning.
1st
2nd
ICERM
David Gleich · Purdue
2
3. The PageRank problem !
The PageRank random surfer
1. With probability beta, follow
a random-walk step
2. With probability (1-beta),
jump randomly ~ dist. v.
Goal find the stationary dist. x!
!
Alg Solve the linear system
Symmetric adjacency matrix
Diagonal degree matrix
Solution
Jump-vector
(I AD 1
)x = (1 )v
x = AD 1
x + (1 )v
ICERM
David Gleich · Purdue
3
4. The PageRank problem & !
the Laplacian
1. (I AD 1
)x = (1 )v;
2. (I A)y = (1 )D 1/2
v,
where A = D 1/2
AD 1/2
and x = D1/2
y; and
3. [↵D + L]z = ↵v where = 1/(1 + ↵) and x = Dz.
Combinatorial Laplacian
ICERM
David Gleich · Purdue
4
5. The Push Algorithm for PageRank
Proposed (in closest form) in Andersen, Chung, Lang "
(also by McSherry, Jeh & Widom) for personalized PageRank
Strongly related to Gauss-Seidel (see my talk at Simons for this)
Derived to show improved runtime for balanced solvers
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > ⌧dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj ⌧dj ⇢)ej
4. r(k+1)
i =
8
><
>:
⌧dj ⇢ i = j
r(k)
i + (rj ⌧dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
The
Push
Method!
⌧, ⇢
ICERM
David Gleich · Purdue
5
7. Why do we care
about push?
1. Used for empirical
studies of
“communities”
2. Used for “fast
PageRank”
approximation
It produces sparse
approximations to
PageRank!
Newman’s netscience!
379 vertices, 1828 nnz
“zero” on most of the nodes
v has a single "
one here
7
9. The O(correct) answer
1. PageRank related to Laplacian
2. Laplacian related to cuts
3. Andersen, Chung, Lang provides the "
“right” bounds and “localization”
This talk the θ(correct) answer?"
A deeper insight into the relationship
ICERM
David Gleich · Purdue
9
11. minimize kBxkC,1 =
P
ij2E Ci,j |xi xj |
subject to xs = 1, xt = 0, x 0.
The s-t min-cut problem
Unweighted incidence matrix
Diagonal capacity matrix
11
12. The localized cut graph
Related to a construction
used in “FlowImprove” "
Andersen & Lang (2007); and
Orecchia & Zhu (2014)
AS =
2
4
0 ↵dT
S 0
↵dS A ↵d¯S
0 ↵dT
¯S 0
3
5
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
ICERM
David Gleich · Purdue
12
13. The localized cut graph
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
BS =
2
4
e IS 0
0 B 0
0 I¯S e
3
5
minimize kBSxkC(↵),1
subject to xs = 1, xt = 0
x 0.
Solve the s-t min-cut
ICERM
David Gleich · Purdue
13
14. The localized cut graph
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
BS =
2
4
e IS 0
0 B 0
0 I¯S e
3
5
Solve the “electrical flow”
s-t min-cut
minimize kBSxkC(↵),2
subject to xs = 1, xt = 0
ICERM
David Gleich · Purdue
14
15. s-t min-cut à PageRank
The PageRank vector z that solves
(↵D + L)z = ↵v
with v = dS/vol(S) is a renormalized
solution of the electrical cut computation:
minimize kBSxkC(↵),2
subject to xs = 1, xt = 0.
Specifically, if x is the solution, then
x =
2
4
1
vol(S)z
0
3
5
Proof
Square and expand
the objective into
a Laplacian, then
apply constraints.
ICERM
David Gleich · Purdue
15
16. PageRank à s-t min-cut
That equivalence works if v is degree-weighted.
What if v is the uniform vector?
A(s) =
2
4
0 ↵sT
0
↵s A ↵(d s)
0 ↵(d s)T
0
3
5 .
ICERM
David Gleich · Purdue
16
17. And beyond …
Easy to cook up interesting diffusion-like
problems and adapt them to this framework. In
particular, Zhou et al. (2004) gave a semi-
supervised learning diffusion we study soon.
2
4
0 eT
S 0
eS ✓A e¯S
0 e¯S 0
3
5 . (I + ✓L)x = eS
ICERM
David Gleich · Purdue
17
18. Back to the push method
Let x be the output from the push method
with 0 < < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ = 1
, = ⌧vol(S)/ , and let zG solve:
minimize 1
2 kBSzk
2
C(↵),2 + kDzk1
subject to zs = 1, zt = 0, z 0
,
where z =
h 1
zG
0
i
.
Then x = DzG/vol(S).
Proof Write out KKT conditions
Show that the push method
solves them. Slackness was “tricky”
Regularization
for sparsity
ICERM
David Gleich · Purdue
18
Need for
normalization
20. This is a case of
Algorithmic Anti-differentiation!
20
21. The ideal world
Given Problem P
Derive solution
characterization C
Show algorithm A "
finds a solution where C
holds
Profit?!
Given “min-cut”
Derive “max-flow is
equivalent to min-cut”
Show push-relabel
solves max-flow "
Profit!!
ICERM
David Gleich · Purdue
21
22. (The ideal world)’
Given Problem P
Derive solution approx.
characterization C’
Show algorithm A’
quickly finds a solution
where C’ holds
Profit?!
Given “sparest-cut”
Derive Rayleigh-
quotient approximation
Show power-method
finds a good Rayleigh-
quotient
Profit?!
ICERM
David Gleich · Purdue
22
23. The real world?
Given Task P
Hack around until you
find something useful
Write paper presenting
“novel heuristic” H for P
and …
Profit!!
Given “find-communities”
Hack around "
??? (hidden) ???
Write paper presenting
“three matvecs finds real-
world communities”
Profit!!
ICERM
David Gleich · Purdue
23
24. Understand why H works!
Show heuristic H solves P’
Guess and check!
until you find something H
solves
Derive characterization of
heuristic H
The real world
Given “find-communities”
Hack around "
Write paper presenting
“three matvecs finds real-
world communities”
Profit!!
Algorithmic Anti-differentiation!
Given heuristic H, is there a problem P’
such that H is an algorithm for P’ ?
ICERM
David Gleich · Purdue
24
e.g. Mahoney & Orecchia
25. If your algorithm is related
to optimization, this is:
Given a procedure X, "
what objective does it
optimize?
The real world
Algorithmic Anti-differentiation!
Given heuristic H, is there a problem P’
such that H is an algorithm for P’ ?
In an unconstrained
case, this is just
“anti-differentiation!”
ICERM
David Gleich · Purdue
25
26. Algorithmic Anti-differentiation
in the literature
Dhillon et al. (2007) "
Spectral clustering, trace minimization & kernel k-means
Saunders (1995) LSQR & Craig iterative methods
ICERM
David Gleich · Purdue
26
27. Why does it matter?!
These details matter in "
many empirical studies, and
can dramatically impact
performance (speed or quality)
ICERM
David Gleich · Purdue
27
30. Semi-supervised
Learning on Graphs
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
Y = Ki L
Our new “kernel”
Indicators on the
revealed labels
Predictions
Experiment vary number of
labeled images and track perf.
y = argmaxj Y
30
31. Semi-supervised
Learning on Graphs
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
Y = Ki L
Experiment vary number of
labeled images and track perf.
y = argmaxj Y
0 20 40
0
0.2
0.4
0.6
0.8
1
Num. labels
Errorrate
K1
K2
K3
RK3
= 1.25
Regularized K3
Zhou et al. NIPS (2004)
31
32. Semi-supervised
Learning on Graphs
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
Y = Ki L
Experiment vary number of
labeled images and track perf.
y = argmaxj Y
Regularized K3
= 2.5
Our new value
Random guessing
32
33. Semi-supervised
Learning on Graphs
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
Y = Ki L
Experiment vary number of
labeled images and track perf.
y = argmaxj Y
Regularized K3
0 20 40
0
0.2
0.4
0.6
0.8
1
Num. labels
Errorrate
K1
K2
K3
RK3
= 2.5
Our new value
Random guessing
33
35. The results of our !
regularized estimate
500 1000 1500 2000 2500 3000 3500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
ICERM
David Gleich · Purdue
35
36. Why does it matter?!
Theory has the answer!
We “sweep” over cuts from
approximate eigenvectors!
It’s the order not the values.
ICERM
David Gleich · Purdue
36
37. 0 20 40
0
0.1
0.2
0.3
0.4
Num. labels
Errorrate
K1
K2
K3
RK3
Improved performance
Y = Ki L
Regularized K3
y = argminj SortedRank(Y)
We have spent no time tuning the reg. parameter.
ICERM
David Gleich · Purdue
37
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
= 2.5
Our new value
38. Anti-di↵erentiating Approximation Algorithms
16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros
Recap & Conclusions
ICERM
David Gleich · Purdue
38
Open issues!
Better treatment of directed graphs?
Algorithm for rho < 1?!
rho set to ½ in most “uses”
Need new analysis
New relationships between
localized cuts & PageRank
New understanding of PPR"
push procedure
Improvements to semi-
supervised learning on
graphs!