Anti-differentiating Approximation Algorithms: PageRank and MinCut

Anti-differentiating
approximation algorithms !
& new relationships between !
Page Rank, spectral, and localized ﬂow
David F. Gleich!
Purdue University!
Joint work with
Michael Mahoney.
Supported by "
NSF CAREER
1149756-CCF, "
Simons Inst.
ICERM
David Gleich · Purdue
1

Anti-differentiating
approximation algorithms !
& new relationships between !
Page Rank, spectral, and localized ﬂow
A new derivation of the PageRank vector for an
undirected graph based on Laplacians, cuts, or ﬂows.
A new understanding of the “push” methods to
compute Personalized PageRank
An empirical improvement to methods for semi-
supervised learning.
1st
2nd
ICERM
2

The PageRank problem !

The PageRank random surfer
1.  With probability beta, follow
a random-walk step
2.  With probability (1-beta),
jump randomly ~ dist. v.
Goal ﬁnd the stationary dist. x!
!
Alg Solve the linear system

Symmetric adjacency matrix
Diagonal degree matrix
Solution
Jump-vector
(I AD 1
)x = (1 )v
x = AD 1
x + (1 )v
ICERM
3

The PageRank problem & !
the Laplacian
1. (I AD 1
)x = (1 )v;
2. (I A)y = (1 )D 1/2
v,
where A = D 1/2
AD 1/2
and x = D1/2
y; and
3. [↵D + L]z = ↵v where = 1/(1 + ↵) and x = Dz.
Combinatorial Laplacian
ICERM
4

The Push Algorithm for PageRank
Proposed (in closest form) in Andersen, Chung, Lang "
(also by McSherry, Jeh & Widom) for personalized PageRank
Strongly related to Gauss-Seidel (see my talk at Simons for this)
Derived to show improved runtime for balanced solvers
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > ⌧dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj ⌧dj ⇢)ej
4. r(k+1)
i =
8
><
>:
⌧dj ⇢ i = j
r(k)
i + (rj ⌧dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
The
Push
Method!
⌧, ⇢
ICERM
5

… demo of push …
ICERM
6

Why do we care
about push?

1.  Used for empirical
studies of
“communities”
2.  Used for “fast
PageRank”
approximation
It produces sparse
approximations to
PageRank!
Newman’s netscience!
379 vertices, 1828 nnz
“zero” on most of the nodes
v has a single "
one here
7

Our question!
Why does the “push method” have
such incredible empirical utility?
8

The O(correct) answer

1.  PageRank related to Laplacian
2.  Laplacian related to cuts
3.  Andersen, Chung, Lang provides the "
“right” bounds and “localization”

This talk the θ(correct) answer?"
A deeper insight into the relationship
ICERM
9

Intellectually indebted to …
Chin, Mądry, Miller & Peng [2013]
Orecchia & Zhu [2014]
10

minimize kBxkC,1 =
P
ij2E Ci,j |xi xj |
subject to xs = 1, xt = 0, x 0.
The s-t min-cut problem
Unweighted incidence matrix
Diagonal capacity matrix
11

The localized cut graph

Related to a construction
used in “FlowImprove” "
Andersen & Lang (2007); and
Orecchia & Zhu (2014)
AS =
2
4
0 ↵dT
S 0
↵dS A ↵d¯S
0 ↵dT
¯S 0
3
5
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
ICERM
12

BS =
2
4
e IS 0
0 B 0
0 I¯S e
3
5
minimize kBSxkC(↵),1
subject to xs = 1, xt = 0
x 0.
Solve the s-t min-cut
ICERM
13

BS =
2
4
e IS 0
0 B 0
0 I¯S e
3
5
Solve the “electrical ﬂow”  
s-t min-cut
subject to xs = 1, xt = 0
ICERM
14

s-t min-cut à PageRank
The PageRank vector z that solves
(↵D + L)z = ↵v
with v = dS/vol(S) is a renormalized
solution of the electrical cut computation:
subject to xs = 1, xt = 0.
Speciﬁcally, if x is the solution, then
x =
2
4
1
vol(S)z
0
3
5
Proof
Square and expand
the objective into
a Laplacian, then
apply constraints.
ICERM
15

PageRank à s-t min-cut
That equivalence works if v is degree-weighted.
What if v is the uniform vector?
A(s) =
2
4
0 ↵sT
0
↵s A ↵(d s)
0 ↵(d s)T
0
3
5 .
ICERM
16

And beyond …

Easy to cook up interesting diffusion-like
problems and adapt them to this framework. In
particular, Zhou et al. (2004) gave a semi-
supervised learning diffusion we study soon.
2
4
0 eT
S 0
eS ✓A e¯S
0 e¯S 0
3
5 . (I + ✓L)x = eS
ICERM
17

Back to the push method
Let x be the output from the push method
with 0 < < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ = 1
,  = ⌧vol(S)/ , and let zG solve:
minimize 1
2 kBSzk
2
C(↵),2 + kDzk1
subject to zs = 1, zt = 0, z 0
,
where z =
h 1
zG
0
i
.
Then x = DzG/vol(S).
Proof Write out KKT conditions
Show that the push method
solves them. Slackness was “tricky”
Regularization
for sparsity
ICERM
18
Need for
normalization

… demo of equivalence …
19

This is a case of
Algorithmic Anti-differentiation!
20

The ideal world
Given Problem P
Derive solution
characterization C
Show algorithm A "
finds a solution where C
holds
Profit?!
Given “min-cut”
Derive “max-flow is
equivalent to min-cut”
Show push-relabel
solves max-flow "

Profit!!
ICERM
21

(The ideal world)’
Given Problem P
Derive solution approx.
characterization C’
Show algorithm A’
quickly finds a solution
where C’ holds
Profit?!
Given “sparest-cut”
Derive Rayleigh-
quotient approximation
Show power-method
finds a good Rayleigh-
quotient
Profit?!
ICERM
22

The real world?
Given Task P
Hack around until you
find something useful
Write paper presenting
“novel heuristic” H for P
and …
Profit!!
Given “find-communities”
Hack around "
??? (hidden) ???
“three matvecs finds real-
world communities”
Profit!!
ICERM
23

Understand why H works!
Show heuristic H solves P’
Guess and check!
until you find something H
solves
Derive characterization of
heuristic H
The real world
Given “find-communities”
Hack around "

“three matvecs finds real-
world communities”
Profit!!
Given heuristic H, is there a problem P’
such that H is an algorithm for P’ ?
ICERM
24
e.g. Mahoney & Orecchia

If your algorithm is related
to optimization, this is:
Given a procedure X, "
what objective does it
optimize?
The real world
Given heuristic H, is there a problem P’
such that H is an algorithm for P’ ?
In an unconstrained
case, this is just
“anti-differentiation!”
ICERM
25

Algorithmic Anti-differentiation
in the literature
Dhillon et al. (2007) "
Spectral clustering, trace minimization & kernel k-means
Saunders (1995) LSQR & Craig iterative methods
ICERM
26

Why does it matter?!
These details matter in "
many empirical studies, and
can dramatically impact
performance (speed or quality)
ICERM
27

Semi-supervised
Learning on Graphs
Ai,j = exp
✓
kdi dj k2
2
2 2
◆
di
dj = 2.5
= 1.25
Zhou et al. NIPS (2003)
28

Semi-supervised
Learning on Graphs
= 2.5
= 1.25
Experiment predict unlabeled
images from the labeled ones
29

Semi-supervised
Learning on Graphs
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
Y = Ki L
Our new “kernel”
Indicators on the
revealed labels
Predictions
Experiment vary number of
labeled images and track perf.
y = argmaxj Y
30

Semi-supervised
Learning on Graphs
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
Y = Ki L
y = argmaxj Y
0 20 40
0
0.2
0.4
0.6
0.8
1
Num. labels
Errorrate
K1
K2
K3
RK3
= 1.25
Regularized K3
Zhou et al. NIPS (2004)
31

Semi-supervised
Learning on Graphs
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
Y = Ki L
y = argmaxj Y
Regularized K3
= 2.5
Our new value
Random guessing
32

Semi-supervised
Learning on Graphs
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
Y = Ki L
y = argmaxj Y
Regularized K3
0 20 40
0
0.2
0.4
0.6
0.8
1
Num. labels
Errorrate
K1
K2
K3
RK3
= 2.5
Our new value
Random guessing
33

What’s happening?
0 0.5 1
0
0.2
0.4
0.6
0.8
1
2 vs. 1,2,3,4, σ=2.50
false pos.
truepos.
K1
K2
K3
RK3
0 0.5 1
0
0.2
0.4
0.6
0.8
1
2 vs. 1,2,3,4, σ=1.25
false pos.
truepos.
K1
K2
K3
RK3
Much better performance!
ICERM
34

The results of our !
regularized estimate
500 1000 1500 2000 2500 3000 3500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
ICERM
35

Why does it matter?!
Theory has the answer!
We “sweep” over cuts from
approximate eigenvectors!
It’s the order not the values.
ICERM
36

0 20 40
0
0.1
0.2
0.3
0.4
Num. labels
Errorrate
K1
K2
K3
RK3
Improved performance
Y = Ki L
Regularized K3
y = argminj SortedRank(Y)
We have spent no time tuning the reg. parameter.
ICERM
37
K2 = (D A) 1
K1 = (I A) 1
K3 = (Diag(Ae) A) 1
= 2.5
Our new value

Anti-di↵erentiating Approximation Algorithms
16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros
Recap & Conclusions
ICERM
38
Open issues!
Better treatment of directed graphs?

Algorithm for rho < 1?!
rho set to ½ in most “uses”
Need new analysis

New relationships between
localized cuts & PageRank

New understanding of PPR"
push procedure

Improvements to semi-
supervised learning on
graphs!

Anti-differentiating Approximation Algorithms: PageRank and MinCut

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Anti-differentiating Approximation Algorithms: PageRank and MinCut

Similar to Anti-differentiating Approximation Algorithms: PageRank and MinCut (20)

Recently uploaded

Recently uploaded (20)

Anti-differentiating Approximation Algorithms: PageRank and MinCut