Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Unraveling Multimodality with Large Language Models.pdf
Localized methods in graph mining
1. Localized methods in !
graph mining
David F. Gleich!
Purdue University!
Joint work with
Kyle Kloster @"
Purdue &
Michael
Mahoney @
Berkeley
supported by "
NSF CAREER
CCF-1149756
David Gleich · Purdue
1
4. Localized methods in graph mining "
use the local structure of a network !
(and not the global structure).
USE THIS
NOT THIS
David Gleich · Purdue
4
5. Point 1 "
Localized methods are the right thing to
use for large graph mining
Point 2 "
Localized methods are still the right thing
to use even if you don't believe my
answer to part 1.
David Gleich · Purdue
5
6. Some graphs have global structure
David Gleich · Purdue
6
Image by R. Rossi from our paper"
on clique detection for "
Temporal Strong-Components
10. At large scales, !
real networks !
look random
(or slightly better)
David Gleich · Purdue
10
11. Localized methods only operate on
meaningful local structures in the data
David Gleich · Purdue
11
12. CAVEATS
There are large-scale global
structures.
BUT
They don’t look like what your
small-scale intuition would predict.
Continents exist in Facebook, but
they don’t look small scale
structures
Leskovec, Lang, Dasgupta, Mahoney.
Internet Math, 2009.
Ugander, Backstrom, WSDM (2013)
Jeub, Balachandran, Porter, Mucha,
Mahoney. Phys Rev E 2015.
David Gleich · Purdue
12
13. Point 1 "
Localized methods are the right thing to
use for large graph mining
Point 2 "
Localized methods are still the right thing
to use even if you don't believe my
answer to part 1.
David Gleich · Purdue
13
14. Local algorithms
give fast answers
to global queries "
(for small-source diffusions)
David Gleich · Purdue
14
16. Pictures from Sparse Matrix Respository (David & Hu)
www.cise.ufl.edu/research/sparse/matrices/
David Gleich · Purdue
16
17. Graph diffusions
David Gleich · Purdue
17
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
Diffusions show how
{importance, rank,
information, status, …}
flows from a source to
target nodes via edges
18. Graph diffusions
David Gleich · Purdue
18
f =
1X
k=0
↵k Pk
s
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
A – adjacency matrix!
D – degree matrix!
P – column stochastic operator
s – the “seed” (a sparse vector)
f – the diffusion result
𝛼k – the path weights
P = AD 1
Px =
X
j!i
1
dj
xj
Graph diffusions help:
1. Attribute prediction
2. Community detection
3. “Ranking”
4. Find small conductance sets
5. Graph label propagation
19. Graph diffusions
David Gleich · Purdue
19
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
P = AD 1
Px =
X
j!i
1
dj
xj
20. Graph diffusions
David Gleich · Purdue
20
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
0 20 40 60 80 100
10
−5
10
0
t=1 t=5 t=15 α=0.85
α=0.99
Weight
Length
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
22. Diffusion based !
community detection
1. Given a seed, approximate the
diffusion.
2. Extract the community.
Both are local operations.
David Gleich · Purdue
22
23. Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:
Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
David Gleich · Purdue
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
23
25. Sweep-cuts find small-
conductance sets
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(b) Zhou (3 labels) (c) Andersen-Lang (3 labels) (d)
Class 1
Class 2
Class 3
Class 1
Class 2
Class 3
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
(a) The adjacency
structure of our sample
with the three
unbalanced classes
indicated.
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(b) Zhou (3 labels) (c) Andersen-Lang (3 labels)
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(e) Zhou (15 labels) (f) Andersen-Lang (15 labels)
Figure 4: A study of the paradoxical e↵ects of value-based rounding on di↵usion
GOOD !
SET 1!
Check the conductance for all “prefixes” of the diffusion vector
sorted by value – there is a fast update – O(sum of degrees work)
GOOD !
SET 2!
GOOD !
SET 3!
David Gleich · Purdue
25
26. Diffusions are localized "
and we have algorithms to find their local regions
David Gleich · Purdue
26
28. Our mission!
Find the solution with work "
roughly proportional to the "
localization, not the matrix.
David Gleich · Purdue
28
29. Our Point"
The push procedure gives "
localized algorithms for diffusions "
in a pleasingly wide variety of settings.
Our Results"
New empirical and theoretical insights into
why and how “push” is so effective
David Gleich · Purdue
29
30. The Push Algorithm for PageRank
Proposed (in closest form) in Andersen,
Chung, Lang (also by McSherry, Jeh &
Widom, Berkhin) for fast approx.
PageRank
Derived to show improved runtime for
balanced solvers
David Gleich · Purdue
30
1. Used for empirical studies
of “communities”
2. Local Cheeger inequality.
3. Used for “fast Page-Rank
approximation”
4. Works on massive graphs
O(1 second) for 4 billion
edge graph on a laptop.
5. It yields weakly localized
PageRank approximations!
Newman’s netscience!
379 vertices, 1828 nnz
Produce an ε-accurate entrywise
localized PageRank vector in work
1
"(1 )
31. Gauss-Seidel and !
Gauss-Southwell
David Gleich · Purdue
31
Methods to solve A x = b
x(k+1)
= x(k)
+ ⇢j ej [Ax(k+1)
]j = [b]jUpdate
such that
In words “Relax” or “free” the jth coordinate of your solution vector in
order to satisfy the jth equation of your linear system.
Gauss-Seidel repeatedly cycle through j = 1 to n
Gauss-Southwell use the value of j that has the highest magnitude residual
r(k)
= b Ax(k)
a
b
c
32. Almost “the push” method
The
Push
Method!
David Gleich · Purdue
32
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > "dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj "dj ⇢)ej
4. r(k+1)
i =
8
><
>:
"dj ⇢ i = j
r(k)
i + (rj "dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
", ⇢
Only push “some” of the residual – If we want tolerance “eps” then
push to tolerance “eps” and no further
33. Push is fast!
For the PageRank diffusion, Push
gives constant work (entry-wise)."
Andersen, Chung, Lang FOCS 2006
1. For the Katz diffusion"
Push works empirically fast "
Bonchi, Gleich, et al., 2012, Internet Math.
2. For the exponential"
Push gives uniform localization
on power-law graphs and fast
runtimes"
Gleich and Kloster, 2014, Internet Math.
3. For the heat-kernel diffusion "
Push gives constant work
(entry-wise)"
Kloster and Gleich, 2014, KDD
4. For the PageRank diffusion "
Push yields sparsity
regularization"
Gleich and Mahoney, ICML 2014
5. For a general class of diffusions "
There is a Cheeger inequality
like before"
Ghosh, Teng, et al. KDD 2014
6. For the PageRank diffusion "
Push gives the solution path in
constant work (entry-wise)"
Kloster and Gleich, arXiv:1503.00322
x = exp(tP)ei
x = exp(P)ei
(I P)x
= (1 ↵)ei
(I A)x
= (1 ↵)ei
PageRank
Katz
David Gleich · Purdue
33
34. Push is useful!
1. Push implicitly regularizes semi-
supervised learning"
Gleich and Mahoney, submitted
2. Push gives state of the art
results for overlapping
community detection "
Whang, Gleich, Dhillon, CIKM 2013!
Whang, Gleich, Dhillon, In prep.
3. Push for overlapping clusters
decrease communication in
parallel solutions"
Andersen, Gleich, Mirrokni, WSDM 2012
David Gleich · Purdue
34
F1 F2
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
DBLP
demon
bigclam
graclus centers
spread hubs
random
egonet
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6 dem
big
gra
spr
ran
ego
Figure 3: F1 and F2 measures comparing our algorithmic comm
indicates better communities.
6
7
8
Run time
demon
bigclam
graclus centers
spread hubs
random
Our seed set
because eac
property ind
sion method
version. Als
Seeding Phase
Seed Set Expansion Phase
Propagation Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44)
35. HK PPR
F1 0.87 0.34
Set size 14 67
F1 0.33 0.14
Set size 192 15293
Amazon
(Average)
Us! Prev. Best
Thisset
Heat Kernel Based Community Detection
KDD 2014
Kyle Kloster and David F. Gleich!
Purdue University
f = exp{tP}s =
1X
k=0
tk
k! Pk
s
Convert to a linear system, and
solve in constant time
36. Heat kernel localization
General recipe!
1. Take problem X, "
convert into a linear
system
2. Apply “push” to that
linear system
3. Analyze and bound
total work
David
Gleich
·∙
Purdue
36
Heat kernel recipe!
1. Convert into "
"
"
2. Apply “push”
3. Analyze work bound "
x = exp(tP)ei
2
6
6
6
6
6
6
4
III
tP/1 III
tP/2
...
... III
tP/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ei
0
...
...
0
3
7
7
7
7
7
7
5
37. There is a fast deterministic
adaptation of the push method
David Gleich · Purdue
37
Kloster & Gleich,
KDD2014
ons
hat
hen,
erm
s is
d to
the
s:
(7)
(8)
tity
em
(9)
to
k ⇡
we
# G is graph as dictionary -of -sets ,
# seed is an array of seeds ,
# t, eps , N, psis are precomputed
x = {} # Store x, r as dictionaries
r = {} # initialize residual
Q = collections.deque () # initialize queue
for s in seed:
r[(s ,0)] = 1./ len(seed)
Q.append ((s ,0))
while len(Q) > 0:
(v,j) = Q.popleft () # v has r[(v,j)] ...
rvj = r[(v,j)]
# perform the hk -relax step
if v not in x: x[v] = 0.
x[v] += rvj
r[(v,j)] = 0.
mass = (t*rvj/( float(j)+1.))/ len(G[v])
for u in G[v]: # for neighbors of v
next = (u,j+1) # in the next block
if j+1 == N: # last step , add to soln
x[u] += rvj/len(G(v))
continue
if next not in r: r[next] = 0.
thresh = math.exp(t)*eps*len(G[u])
thresh = thresh /(N*psis[j+1])/2.
if r[next] < thresh and
r[next] + mass >= thresh:
Q.append(next) # add u to queue
r[next] = r[next] + mass
Figure 2: Pseudo-code for our algorithm as work-
ing python code. The graph is stored as a dic-
Let h = e t
exp{tP}s.
Let x = hk-push(") output
Then kD 1
(x h)k1 "
after looking at 2Net
" edges.
We believe that the bound below suffices
N 2t log(1/")
MMDS 2014
THEOREM!
38. Analysis, three pages to one slide
1. State the approximation error that results from
approximating using the linear system.!
“Standard” matrix-approximation result.
2. Bound the work involved in doing push. !
Iterate y ≥ 0, residual r ≥ 0 "
Each step moves “mass” from r to y, "
keeps non-neg and increasing property."
Each step moves at least “deg(i)·ε” mass in deg(i) work"
So in T steps, we “push” Sum [ deg(i)·ε , i in each step]"
But we can only push “so much”, so we can bound this
from above, and invert to get a total work bound.
David Gleich · Purdue
38
Kloster & Gleich,
KDD2014
X
i2steps
"deg(i) et
41. PageRank solution paths
David Gleich · Purdue
41
These take about a second
to compute with our “new”
push-based algorithm on
graphs with millions of
nodes and edges
Related to the LARS
method for 1-norm
regularized problems
42. Use “centers” of graph partitions to
seed for overlapping communities
David Gleich · Purdue
42
0 10 20 30 40 50 60 70 80 90 100
0
Coverage (percentage)
Student Version of MATLAB
(a) AstroPh
0
0
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Coverage (percentage)
MaximumConductance
egonet
graclus centers
spread hubs
random
bigclam
(d) Flickr
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MaximumConductance
Flickr social
network
2M vertices"
22M edges
We can cover
95% of network
with communities
of cond. ~0.15.
43. References and ongoing work
Gleich and Kloster – Relaxation methods for the matrix exponential, J.
Internet Math "
Kloster and Gleich – Heat kernel based community detection KDD2014
Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 "
Gleich and Mahoney – Regularized diffusions, Submitted
Whang, Gleich, Dhillon – Seeds for overlapping communities, CIKM 2013
www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
www.cs.purdue.edu/homes/dgleich/codes/l1pagerank
• Improved localization bounds for functions of matrices
• Asynchronous and parallel “push”-style methods
• Localized methods beyond conductance
David Gleich · Purdue
43
Supported by NSF CAREER 1149756-CCF
www.cs.purdue.edu/homes/dgleich