SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Spectral graph clustering
with motifs and
higher-order structures
David F. Gleich
Purdue University
Code & Data github.com/arbenson/higher-order-organization-julia
github.com/dgleich/motif-ssbm
9
10
8
7
2
0
4
3
11
6
5
1
Austin Benson (Stanford -> Cornell)
Jure Leskovec (Stanford) NAConf'17David Gleich · Purdue
1
Graphs and matrices have a long and
intertwined history.
Matrices and graphs represent
relationships among a group of
objects.
To study the relationships
• centrality
• reachability
• clustering
• … and more …
often use matrix computations
• e.g. Estrada & Higham, SIREV
• e.g. Network analysis, Brandes & Erlebach
Helen Bott, Observation of play
activities in a nursery school, 1928
Ax = b Ax = x
… a suggestion based on our work …
given a graph G = (V, E)
and its adjacency matrix A
consider using the weighted matrix W = A2
A
Hadamard /
element-wise
and its non-symmetric
adjacency matrix A
consider using a symmetric
weighted matrix from
given a directed graph G = (V, E)
Motif Matrix computations W =
M1 C = (U · U) UT
C + CT
M2 C = (B · U) UT
+ (U · B) UT
+ (U · U) B C + CT
M3 C = (B · B) U + (B · U) B + (U · B) B C + CT
M4 C = (B · B) B C
M5 C = (U · U) U + (U · UT
) U + (UT
· U) U C + CT
M6 C = (U · B) U + (B · UT
) UT
+ (UT
· U) B C
M7 C = (UT
· B) UT
+ (B · U) U + (U · UT
) B C
M8 C = (U · N) U + (N · UT
) UT
+ (UT
· U) N C
Triangles in social
relationships.
Simmel, 1908
Rapoport, 1953
Granovetter, 1973
what
why
how
(A*A).*A Matlab
(A*A).*A Julia
np.dot(A,A)*A Python (A is array)
When clustering based on triangles, we often
have better numerical properties (e.g. eigenvalue
gaps) in model partitioning problems (stochastic
block models) and better real-world results.
The matrix W = A2
A arises from our motif and
higher-order clustering framework when using
triangles as the motif.
where
when
… a little story …
Networks are sets of nodes and edges (graphs)
that model real world systems
Key insight. [Flake et al., Newman et al., and hundreds more!]
Networks—for real-world systems—have modules, communities, clusters
This structure has traditionally been exposed with node and edge based
clustering metrics. Density, modularity, conductance, cut, ratio cuts, etc.
NAConf'17David Gleich · Purdue
9
Co-author network
8
Background network clustering is a fundamental network
analysis for finding coherent groups of nodes based on edges
§ Real-world networks have modular organization [Newman 2004, Newman 2006].
§ We want to automatically find the modules in the system.
Co-author network
§ Old idea Find groups of nodes with high internal edge density and low
external edge density [Newman 2004, Danon 2005, Leskovec+ 2009].
Brain network, de Reus et al., RSTB, 2014.
Brain network, de Reus et al., RSTB, 2014.
Similar tools are used to partition
computations for parallelism
Comanche mesh from Alex Pothen,
from Sparse Matrix Collection NAConf'17David Gleich · Purdue
10
There is abundant evidence that higher-order
connectivity patterns drive complex systems.
NAConf'17David Gleich · Purdue
11
4
es
order
drive
Mangan et al., 2003
Alon, 2007
Signed feed-forward loops
in genetic transcription.A C
B
CC
A C
B
A B
C
ks.
4
Thinking beyond nodes and edges
There is abundant evidence that higher-order
connectivity patterns, or network motifs, drive
complex systems [Milo+02, Yaveroğlu+14].
Mangan et al., 2003
Alon, 2007
Signed feed-forward loops
in genetic transcription.A
B
A C
B
A C
B
A
B
A B
C
Triangles in social
relationships.
Simmel, 1908
Rapoport, 1953
Granovetter, 1973
Bi-directed length-2
paths in brain networks.
Sporns-Kötter, 2004
Sporns et al., 2007
Honey et al., 2007 4
Thinking beyond nodes and edges
There is abundant evidence that higher-order
connectivity patterns, or network motifs, drive
complex systems [Milo+02, Yaveroğlu+14].
Mangan et al., 2003
Alon, 2007
Signed feed-forward loops
in genetic transcription.A
B
A C
B
A C
B
A
B
A B
C
Triangles in social
relationships.
Simmel, 1908
Rapoport, 1953
Granovetter, 1973
Bi-directed length-2
paths in brain networks.
Sporns-Kötter, 2004
Sporns et al., 2007
Honey et al., 2007
4
Thinking beyond nodes and edges
There is abundant evidence that higher-order
connectivity patterns, or network motifs, drive
complex systems [Milo+02, Yaveroğlu+14].
Mangan et al., 2003
Alon, 2007
Signed feed-forward loops
in genetic transcription.A
B
A C
B
A C
B
A
B
A B
C
Triangles in social
relationships.
Simmel, 1908
Rapoport, 1953
Granovetter, 1973
Bi-directed length-2
paths in brain networks.
Sporns-Kötter, 2004
Sporns et al., 2007
Honey et al., 2007
Simmel, 1908
Rapoport, 1953
Granovetter, 1973
Sporns-Kötter, 2004
Sporns et al., 2007
Honey et al., 2007
Mangan et al., 2003
Alon, 2007
Triangles in
social networks
Bi-directed paths
in brain networks
Signed feed-forward loops
in genetic transcription
Key Insight. [Milo et al. (Science 2002)]
Certain subgraphs were far more
common than expected.
We call any small subgraph a motif.
Nodes and edges may not be
the basis elements of these networks.
Why should we look for module structure
in terms of nodes and edges?
NAConf'17David Gleich · Purdue
12
Idea Find clusters of motifs
NAConf'17David Gleich · Purdue
13
Higher-order organization of
complex networks
We generalize spectral clustering, a classic
technique to find clusters or communities in a
graph, to use motifs to cluster the graph.
• Uses motif conductance instead of node & edge conductance
• We also bound the conductance in terms of the optimal solution
Outline
1. So we’ll briefly review how spectral clustering works
2. Then see how to adapt it to work with network motifs
3. Then see this procedure on real-world & model data
We can do motif-based clustering by
generalizing spectral clustering
Spectral clustering is a classic technique to partition
graphs by looking at eigenvectors.
M. Fiedler, 1973,
Algebraic
connectivity of
graphs
Graph Laplacian Eigenvector
NAConf'17David Gleich · Purdue
15
Earlier work by Simon, Ando, Courtois
dealt with a related decomposability idea
eigenvalueentry
A L = D 1/2
(D A)D 1/2
(D A)x = Dx
Spectral clustering works based on
conductance with node and edge cuts
NAConf'17David Gleich · Purdue
16
Conductance is one of the most important quality scored used to
identify network modules, clusters or communities [Schaeffer 2007]
used in Markov chain theory, bioinformatics, vision, etc.
(edges leaving the set)
(total edges
in the set)
(S) =
cut(S)
min vol(S), vol( ¯S)
S S
vol(S) =
P
i2S degree of i
(conductance)
cut(S) = # edges between S, ¯S
Spectral clustering works based on
conductance with node and edge cuts
NAConf'17David Gleich · Purdue
17
Conductance is one of the most important quality scored used to
identify network modules, clusters or communities [Schaeffer 2007]
used in Markov chain theory, bioinformatics, vision, etc.
(edges leaving the set)
(total edges
in the set)
(S) =
cut(S)
min vol(S), vol( ¯S)
S S
(conductance)
cut(S) = 7 cut( ¯S) = 7
|S| = 15 | ¯S| = 20
vol(S) = 85 vol( ¯S) = 151
cut(S) = 7 cut( ¯S) = 7
|S| = 15 | ¯S| = 20
vol(S) = 85 vol( ¯S) = 151
(S) = 7/85
= 0.082
Small conductance ó Good set
Spectral clustering has theoretical
guarantees
Cheeger Inequality
Finding the best conductance set
is NP-hard. L
• Cheeger realized the eigenvalues of the
Laplacian provided a bound in manifolds
• Alon and Milman independently realized
the same thing for a graph!
J. Cheeger, 1970,
A lower bound on
the smallest
eigenvalue of the
Laplacian
N. Alon, V. Milman
1985. λ1 isoperi-
metric inequalities
for graphs and
superconcentrators
Laplacian 2
⇤/2  2  2 ⇤
0 = 1  2  ...  n  2
Eigenvalues of the Laplacian
⇤ = set of smallest conductance
NAConf'17David Gleich · Purdue
18
The sweep cut algorithm realizes the
guarantee
We can find a set S that achieves
the Cheeger bound.
1. Compute the eigenvector
associated with λ2 (e.g. ARPACK)
2. Sort the vertices by their values
in the eigenvector: σ1, σ2, … σn
3. Let Sk = {σ1, …, σk} and
compute the conductance of
each Sk: φk = φ(Sk)
4. Pick the minimum φm of φk .
M. Mihail, 1989
Conductance and
convergence of
Markov chains
F. C. Graham,
1992, Spectral
Graph Theory.
NAConf'17David Gleich · Purdue
19
m  2
p
⇤
The sweep cut visualized
0 20 40
0
0.2
0.4
0.6
0.8
1
S
i
φi
(S) =
cut(S)
min vol(S), vol( ¯S)
NAConf'17David Gleich · Purdue
20
But current problems are much more rich
than where spectral is justified
Spectral clustering is theoretically justified for undirected graphs
• Various extensions to multiple clusters [Dhillon et al.; Gharan et al.; Jordan et al.]
• Weighted graphs are okay
• Approximate eigenvectors are okay [Mihail]
Current network models are more richly annotated
• directed, signed, colored, layered, multiplex, etc.
R. Milo, 2002, Science
X causes Y to be expressed
Z represses Y
X
Z
Y
+
–
NAConf'17David Gleich · Purdue
21
Nice recent work by [Fairbanks
et al. arXiv] on better
numerical stopping criteria!
There is a literature on directed spectral
graph partitioning, but it is hard to interpret
Markov chains
• Stewart (numerical solution to Markov chains)
• Chung (Random walks and cuts in dir graphs )
Nonlinear Laplacian
• Yoshida WSDM2016
Asymmetric Laplacian
• Boley et al. LAA2011 (commute times)
Gleich, Klymko, Kolda ASE BigData 2014
D 1
Ax = x
(D A)x = x
1
2 ⇧(D 1
A) + 1
2 (AT
D 1
)⇧
X
(u,v)2E
(
(xu xv )2
xu xv 0
0 otherwise
NAConf'17David Gleich · Purdue
22
Our contributions
1. A generalized conductance metric
for motifs
2. A “new” spectral clustering algorithm to
minimize the generalized conductance.
3. AND an associated Cheeger inequality.
(which handles directed graphs)
4. Aquatic layers in food webs
5. Hub structure in transportation
This talk, still preliminary!
NAConf'17David Gleich · Purdue
23
Some studies in stochastic block modelsNew!
Motif-based conductance generalizes
edge-based conductance
Need notions of cut and volume
S
S
S¯S
¯S
vol(S) = #(edge end points in S)
NAConf'17David Gleich · Purdue
24
cut(S) = #(edges cut by S) cutM (S) = #(motifs cut by S)
volM (S) = #(motif
end points in S)
M (S) =
cutM (S)
min(volM (S), volM ( ¯S))
(S) =
cut(S)
min(vol(S), vol( ¯S))
vol(S) =
P
i2S degree of i
An example of motif-conductance
9
10
6
5
8
1
7
2
0
4
3
11
9
10
8
7
2
0
4
3
11
6
5
1
¯S
S
Motif
M (S) =
motifs cut
motif volume
=
1
10
NAConf'17David Gleich · Purdue
25
How can we optimize motif conductance?
We thought that motif conductance would spark new tensor and
hypermatrix methods based on the motif adjacency tensor.
NAConf'17David Gleich · Purdue
26
1
3
2
A
We were wrong!
A(i, j, k) =
(
1 if motif involves nodes i, j, k
0 otherwise
Benson, Gleich, Leskovec, SDM 2016
There is a symmetric matrix that serves as the
appropriate tool to study motif conductance
9
10
6
5
8
1
7
2
0
4
3
11
9
10
6
5
8
1
7
2
0
4
3
11
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
2
3
A
W(M)
ij = counts co-occurrences of motif pattern between i, j
W(M)
NAConf'17David Gleich · Purdue
27
Going from motifs back to a matrix for
spectral clustering
9
10
6
5
8
1
7
2
0
4
3
11
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
2
3
W(M)
ij = counts co-occurrences of motif pattern between i, j
W(M)
KEY INSIGHT
Spectral clustering on
W(M) yields results on
the new motif notion
of conductance
M (S) =
motifs cut
motif volume
=
1
10
NAConf'17David Gleich · Purdue
28
Here is a quick illustration of how this works.
NAConf'17David Gleich · Purdue
29
9
10
6
5
8
1
7
2
0
4
3
11
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
2
3
9
10
8
7
2
0
4
3
11
6
5
1
M (S) =
motifs cut
motif volume
=
1
10
cut(S) = 2
vol(S) = 6 + 8 + 2 + 2 + 2
=
1
10
A motif-based clustering algorithm
1. Form weighted graph W(M)
2. Compute the Fiedler vector associated with λ2 of the
motif-normalized Laplacian
3. Run a sweep cut on f
9
10
6
5
8
1
7
2
0
4
3
11
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
2
3
W(M)
D = diag(W(M)
e)
L(M)
= D 1/2
(D W(M)
)D 1/2
L(M)
z = 2z
f(M)
= D 1/2
z
NAConf'17David Gleich · Purdue
30
The sweep cut results
2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
1
2
0
4
3
1
2
0
4
3
9
10
6
Best higher-
order cluster
2nd best higher-
order cluster
9
10
6
5
8
1
7
2
0
4
3
11
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
2
3
(Order from the Fiedler vector)
NAConf'17David Gleich · Purdue
31
There are nice matrix computations
for three-node motifs
NAConf'17David Gleich · Purdue
32
W = A2
A
4
Thinking beyond nodes and edges
There is abundant evidence that higher-order
connectivity patterns, or network motifs, drive
complex systems [Milo+02, Yaveroğlu+14].
Mangan et al., 2003
Alon, 2007
Signed feed-forward loop
in genetic transcription.A
B
A
B
A C
B
D
A
B
A B
C
Figure 1: Higher-order network str
framework. A: Higher-order structur
13 connected three-node directed motif
Triangles in social
relationships.
Simmel, 1908
Rapoport, 1953
Granovetter, 1973
Bi-directed length-2
paths in brain networks.
Sporns-Kötter, 2004
Sporns et al., 2007
Honey et al., 2007
There are nice matrix computations
for three-node directed motifs
Given a (directed) adjacency matrix A, let B = A AT
and U = A B
bidirectional unidirectional
Motif Matrix computations W =
M1 C = (U · U) UT
C + CT
M2 C = (B · U) UT
+ (U · B) UT
+ (U · U) B C + CT
M3 C = (B · B) U + (B · U) B + (U · B) B C + CT
M4 C = (B · B) B C
M5 C = (U · U) U + (U · UT
) U + (UT
· U) U C + CT
M6 C = (U · B) U + (B · UT
) UT
+ (UT
· U) B C
M7 C = (UT
· B) UT
+ (B · U) U + (U · UT
) B C
M8 C = (U · N) U + (N · UT
) UT
+ (UT
· U) N C
N
= ee
T
B
U
U
T
NAConf'17David Gleich · Purdue
33
The three-node
motif-based Cheeger inequality
THEOREM
If the motif has three nodes, then the
sweep procedure on the weighted graph
finds a set S of nodes for which
M(G) = {instances of M in G}
Key Proof Step
NAConf'17David Gleich · Purdue
34
cutM (S, G) =
X
{i,j,k}2M(G)
Indicator[xi , xj , xk not the same]
= 1
4 (x2
i + x2
j + x2
k xi xj xj xk xi xk )
= quadratic in x
M (S)  2
q
⇤
M
IMPLICATION
Just run spectral clustering
on those weighted matrices.
Awesome advantages
Works for arbitrary non-neg. combos of motifs too
We inherit 40+ years of research!
• Fast algorithms (ARPACK, etc.)!
• Local methods!
Yin, Benson, Leskovec, Gleich,
KDD2017
• Overlapping!
• Easy to implement
(20 lines of Matlab/Julia)
• Scalable (1.4B edges graphs
are not a prob.)
NAConf'17David Gleich · Purdue
35
17 elseif motif == "M5"
18 C = (U * U) .* U + (U * U’) .* U + (U’ * U) .* U
19 W = C + C’
20 elseif motif == "M6"
21 W = (U * B) .* U + (B * U’) .* U’ + (U’ * U) .* B
22 elseif motif == "M7"
23 W = (U’ * B) .* U’ + (B * U) .* U + (U * U’) .* B
24 else
25 error("Motif must be one of M1, M2, M3, M4, M5, M6, or M7.")
26 end
27
28 # Get Fiedler eigenvector
29 dinvsqrt = spdiagm(1.0 ./ sqrt.(vec(sum(W, 1))))
30 LM = I - dinvsqrt * W * dinvsqrt
31 lambdas, evecs = eigs(LM, nev=2, which=:SM)
32 z = dinvsqrt * real(evecs[:, 2])
33
34 # Sweep cut
35 sigma = sortperm(z)
36 C = W[sigma, sigma]
37 Csums = sum(C, 1)’
38 motifvolS = cumsum(Csums)
39 motifvolSbar = sum(W) * ones(length(sigma)) - motifvolS
40 conductances = cumsum(Csums - 2 * sum(triu(C), 1)’) ./ min.(motif
41 split = indmin(conductances)
42 if split <= length(size(A, 1) / 2)
43 return sigma[1:split]
44 else
45 return sigma[(split + 1):end]
46 end
47 end
Figure 2.3 – Julia implementation of the motif-based spectral clusteri
Case study 1
Motifs partition the food webs
Food webs model
energy exchange
in species of an
ecosystem.
means i’s energy
goes to j
(or j eats i)
NAConf'17David Gleich · Purdue
36
i j
Case study 1
Motifs partition the food webs
Food webs model
energy exchange
in species of an
ecosystem.
means i’s energy
goes to j
(or j eats i)
Via Cheeger, motif
conductance is
better than edge
conductance.
NAConf'17David Gleich · Purdue
37
i j
Demo and reproducibility
https://github.com/arbenson/higher-order-organization-julia
NAConf'17David Gleich · Purdue
38
# form W0 … W4
sc0 = spectral_cut(W0)
sc1 = spectral_cut(W1)
sc2 = spectral_cut(W2)
sc3 = spectral_cut(W3)
sc4 = spectral_cut(W4)
plt = x ->
semilogx(x.sweepcut_profile
.conductance)
plt(sc0)
plt(sc1)
plt(sc2)
plt(sc3)
plt(sc4)
Case study 1
Motifs partition the food webs
NAConf'17David Gleich · Purdue
39
B D
Micronutrient
sources
Pelagic fishes
and benthic
prey
Benthic macro-
invertebrates
Benthic Fishes
Motif M6 reveals
aquatic layers
A
61% accuracy vs.
48% with edge-
based methods
24
Application 1 Food webs
Case study 2
Hub structure in the air transportation network
North American air
transport network
Nodes are airports
Edges reflect
reachability, and
are unweighted.
(Based on Frey
et al.’s 2007)
NAConf'17David Gleich · Purdue
40
The weighed adjacency matrix already
reveals hub-like structure
NAConf'17David Gleich · Purdue
41
Accepted pending
	
B
A
Counts length-two walks
The motif embedding shows this structure
and splits into east-west
Top 10
U.S. hubs
East coast non-hubs
West coast non-hubs
Primary spectral coordinate
Atlanta, the top hub, is
next to Salina, a non-hub.
MOTIF SPECTRAL
EMBEDDING
EDGE SPECTRAL
EMBEDDING
NAConf'17David Gleich · Purdue
42
Case study 3: the stochastic block model
shows numerical advantages to motif-matrices
Model problems are useful in because they are simple and we often
“know” everything about them. They may not reflect real-world issues.
Mouse picture from Wikipedia Мышь_2.jpg,
Fly from oregonstateuniversity/11179958483
Biology Matrix Computations Clustering
r2
u = f
NAConf'17David Gleich · Purdue
43
The stochastic block model is extremely well
understood in theory
The symmetric stochastic block model (SSBM)
• k blocks, each of size m-by-m
• within-block edges exist with prob p
• between-block edges with prob q
symmetric stochastic block model
m = 200, k = 5,
p = 0.3, q = 0.13
m m m
2
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
5
m p q · · · q
m q p
...
...
...
m q q p
.
Reminescnt of Simon & Ando.
NAConf'17David Gleich · Purdue
44
The stochastic block model is extremely well
understood in theory
The symmetric stochastic block model (SSBM)
• k blocks, each of size m-by-m
• within-block edges exist with prob p
• between-block edges with prob q
m m m
2
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
5
m p q · · · q
m q p
...
...
...
m q q p
.
The task
Given a graph that is an SSBM and
given m, k, p, q
Find the k blocks.
Theory
E. Abbe, community
detection and the
stochastic block model
(In prep, on webpage)
• Necessary p > q
• Exact recovery (get all correct)
• Detectability (find a non-trivial portion)
• Uses non-backtracking random walk.
m = 200, k = 5,
p = 0.3, q = 0.13
NAConf'17David Gleich · Purdue
45
4
Thinking beyond nodes and edges
There is abundant evidence that higher-order
connectivity patterns, or network motifs, drive
complex systems [Milo+02, Yaveroğlu+14].
Mangan et al., 2003
Alon, 2007
Signed feed-forward loops
in genetic transcription.A C
B
D
Figure S8: Higher-order organization of the S. cerevisiae transcriptional regulation network.
A: The four higher-order structures used by our higher-order clustering method, which can
model signed motifs. These are coherent feedfoward loop motifs, which act as sign-sensitive
delay elements in transcriptional regulation networks (46). The edge signs refer to activation
(positive) or repression (negative). B: Six higher-order clusters revealed by the motifs in (A).
Clusters show functional modules consisting of several motifs (coherent feedforward loops),
which were previously studied individually (45). The higher-order clustering framework identi-
fies the functional modules with higher accuracy (97%) than existing methods (68–82%). C–D:
Two higher-order clusters from (B). In these clusters, all edges have positive sign. The func-
tionality of the motifs in the modules correspond to drug resistance (C) or cell cycle and mating
type match (D). The clustering suggests that coherent feedforward loops function together as a
single processing unit rather than as independent elements.
S48
A C
B
D
Figure S8: Higher-order organization of the S. cerevisiae transcriptional regulation network.
A: The four higher-order structures used by our higher-order clustering method, which can
model signed motifs. These are coherent feedfoward loop motifs, which act as sign-sensitive
delay elements in transcriptional regulation networks (46). The edge signs refer to activation
(positive) or repression (negative). B: Six higher-order clusters revealed by the motifs in (A).
Clusters show functional modules consisting of several motifs (coherent feedforward loops),
which were previously studied individually (45). The higher-order clustering framework identi-
fies the functional modules with higher accuracy (97%) than existing methods (68–82%). C–D:
Two higher-order clusters from (B). In these clusters, all edges have positive sign. The func-
tionality of the motifs in the modules correspond to drug resistance (C) or cell cycle and mating
type match (D). The clustering suggests that coherent feedforward loops function together as a
single processing unit rather than as independent elements.
S48
A C
B
D
Figure S8: Higher-order organization of the S. cerevisiae transcriptional regulation network.
A: The four higher-order structures used by our higher-order clustering method, which can
model signed motifs. These are coherent feedfoward loop motifs, which act as sign-sensitive
delay elements in transcriptional regulation networks (46). The edge signs refer to activation
(positive) or repression (negative). B: Six higher-order clusters revealed by the motifs in (A).
Clusters show functional modules consisting of several motifs (coherent feedforward loops),
which were previously studied individually (45). The higher-order clustering framework identi-
fies the functional modules with higher accuracy (97%) than existing methods (68–82%). C–D:
Two higher-order clusters from (B). In these clusters, all edges have positive sign. The func-
tionality of the motifs in the modules correspond to drug resistance (C) or cell cycle and mating
type match (D). The clustering suggests that coherent feedforward loops function together as a
single processing unit rather than as independent elements.
S48
A C
B
D
Figure S8: Higher-order organization of the S. cerevisiae transcriptional regulation network.
A: The four higher-order structures used by our higher-order clustering method, which can
model signed motifs. These are coherent feedfoward loop motifs, which act as sign-sensitive
delay elements in transcriptional regulation networks (46). The edge signs refer to activation
(positive) or repression (negative). B: Six higher-order clusters revealed by the motifs in (A).
Clusters show functional modules consisting of several motifs (coherent feedforward loops),
which were previously studied individually (45). The higher-order clustering framework identi-
fies the functional modules with higher accuracy (97%) than existing methods (68–82%). C–D:
Two higher-order clusters from (B). In these clusters, all edges have positive sign. The func-
tionality of the motifs in the modules correspond to drug resistance (C) or cell cycle and mating
type match (D). The clustering suggests that coherent feedforward loops function together as a
single processing unit rather than as independent elements.
S48
A B
C
Figure 1: Higher-order network structures and the higher-order network clustering
framework. A: Higher-order structures are captured by network motifs. For example, all
13 connected three-node directed motifs are shown here. B: Clustering of a network based on
motif M7. For a given motif M, our framework aims to find a set of nodes S that minimizes
motif conductance, M (S), which we define as the ratio of the number of motifs cut (filled
triangles cut) to the minimum number of nodes in instances of the motif in either S or ¯S (13).
In this case, there is one motif cut. C: The higher-order network clustering framework. Given a
graph and a motif of interest (in this case, M7), the framework forms a motif adjacency matrix
(WM ) by counting the number of times two nodes co-occur in an instance of the motif. An
eigenvector of a Laplacian transformation of the motif adjacency matrix is then computed. The
ordering of the nodes provided by the components of the eigenvector (15) produces nested sets
Sr = { 1, . . . , r} of increasing size r. We prove that the set Sr with the smallest motif-based
conductance, M (Sr), is a near-optimal higher-order cluster (13).
7
Triangles in social
relationships.
Simmel, 1908
Rapoport, 1953
Granovetter, 1973
Bi-directed length-2
paths in brain networks.
Sporns-Kötter, 2004
Sporns et al., 2007
Honey et al., 2007
Our study is to look at SSBM model using
our motif-weighting based on triangles.
4
Thinking beyond nodes and edges
There is abundant evidence that higher-order
connectivity patterns, or network motifs, drive
complex systems [Milo+02, Yaveroğlu+14].
Mangan et al., 2003
Alon, 2007
Signed feed-forward loops
in genetic transcription.A C
B
D
A C
B
D
A C
B
D
A C
B
D
A B
C
Figure 1: Higher-order network structures and the
Triangles in social
relationships.
Simmel, 1908
Rapoport, 1953
Granovetter, 1973
Bi-directed length-2
paths in brain networks.
Sporns-Kötter, 2004
Sporns et al., 2007
Honey et al., 2007
W = A2
A
NAConf'17David Gleich · Purdue
46
Based also on Tsourakakis, Pachocki, & Mitzenmacher. WWW, 2017
who showed -conductance < edge-conductance in a model.
Just using the motif-weighting highlights the
blocks for a range of parameters.
We introduce a mixing parameter μ to scale q.
µ = 0 $ q = 0, µ = k 1
k $ q = p
W = A2
AA
NAConf'17David Gleich · Purdue
47
The power method identifies a cluster using
the motif weighting better than the adjacency
Detectability
Exact recovery
Exp. details.
We take the
normalized
Lap and shift
to reverse the
spectrum.
Then we
deflate given
knowledge of
the leading
eigenvector.
Accuracy is
the most
accurate block
in the extremal
m entries
AccuracyW = A2
AA
The power method identifies a cluster using
the motif weighting better than the adjacency
W=A2
A
A
NAConf'17David Gleich · Purdue
49
We don’t converge faster for the usual
reasons that the power method converges
NAConf'17David Gleich · Purdue
50
There is a bigger gap deeper in the spectrum,
that could explain what is going on
NAConf'17David Gleich · Purdue
51
The motif weighting shifts all the eigenvalues
down, but lowest drop the most.
Semi-circle law
Not a Marchenko-
Pastur law!
W = A2
AA
We’d like a numerical understanding of why
we get better results faster with motifs.
Eigenvalues show that we’ll converge to the “cluster subspace” faster.
Conjecture. Higher accuracy for motifs because the eigenvectors are
more localized—or sharper—around the clusters.
• What remains is to understand why they are sharper!
W = A2
AA
Related work.
• Laplacian we propose was originally proposed by Rodríguez [2004]
and again by Zhou et al. [2006]
Our new theory (motif Cheeger inequality) explains why these were good ideas.
• Falls under general strategy of encoding hypergraph partitioning
problem as graph clustering problem [Agarwal+ 06]
• Serrour, Arenas, & Gómez, Detecting communities of triangles in
complex networks using spectral optimization, 2011.
• Arenas et al., Motif-based communities in complex networks, 2008.
• Rohe & Qin, Blessing of transitivity …, arXiv, 2013.
• Klymko, Gleich, Kolda (Using triangles & cycles …, ASE BigData 2014)
• Benson, Gleich, Leskovec (Motifs & Tensors, SIAM Data Mining 2015)
NAConf'17David Gleich · Purdue
54
Paper
Benson, Gleich, Leskovec
Science, 2016
1. A generalized conductance metric for motifs
2. A new spectral clustering algorithm to
minimize the generalized conductance.
3. AND an associated Cheeger inequality.
4. Aquatic layers in food webs
5. Hub structure in transportation networks
6. Eigenvalues & vectors of motifs in SSBMs.
7. Lots of cool stuff on signed networks.
Joint work with
Austin Benson and Jure
Leskovec, Stanford
Supported by NSF CAREER
CCF-1149756, IIS-1422918
IIS- DARPA SIMPLEX
9 10
2
0
4
3
6
5
1
NAConf'17David Gleich · Purdue
55
Code & Data
snap.stanford.edu/higher-order
github.com/arbenson/higher-order-organization-julia
github.com/dgleich/motif-ssbm
Open questions
• What is the distribution law
for the Laplacian of A2 ⊙ A
• How to work with element-
wise prods like matvecs for
N = eeT
B U UT
Thank you!

Contenu connexe

Tendances

Fractional Calculus PP
Fractional Calculus PPFractional Calculus PP
Fractional Calculus PPVRRITC
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graphDing Li
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?IAMAl
 
Priorに基づく画像/テンソルの復元
Priorに基づく画像/テンソルの復元Priorに基づく画像/テンソルの復元
Priorに基づく画像/テンソルの復元Tatsuya Yokota
 
Graph theory 1
Graph theory 1Graph theory 1
Graph theory 1Tech_MX
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya
 
Complex Analysis - Differentiability and Analyticity (Team 2) - University of...
Complex Analysis - Differentiability and Analyticity (Team 2) - University of...Complex Analysis - Differentiability and Analyticity (Team 2) - University of...
Complex Analysis - Differentiability and Analyticity (Team 2) - University of...Alex Bell
 
An Introduction to Spectral Graph Theory
An Introduction to Spectral Graph TheoryAn Introduction to Spectral Graph Theory
An Introduction to Spectral Graph Theoryjoisino
 
モンテカルロサンプリング
モンテカルロサンプリングモンテカルロサンプリング
モンテカルロサンプリングKosei ABE
 
2値分類・多クラス分類
2値分類・多クラス分類2値分類・多クラス分類
2値分類・多クラス分類t dev
 
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...Laxmi Kant Tiwari
 
[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphsDeep Learning JP
 
In sight into wavelets from theory to practice , soman k.p. ,ramachandran k.i...
In sight into wavelets from theory to practice , soman k.p. ,ramachandran k.i...In sight into wavelets from theory to practice , soman k.p. ,ramachandran k.i...
In sight into wavelets from theory to practice , soman k.p. ,ramachandran k.i...Bharat Chandra Sahu
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from XailientEdge AI and Vision Alliance
 

Tendances (20)

Unit 2: All
Unit 2: AllUnit 2: All
Unit 2: All
 
Fractional Calculus PP
Fractional Calculus PPFractional Calculus PP
Fractional Calculus PP
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
 
Priorに基づく画像/テンソルの復元
Priorに基づく画像/テンソルの復元Priorに基づく画像/テンソルの復元
Priorに基づく画像/テンソルの復元
 
Graph theory 1
Graph theory 1Graph theory 1
Graph theory 1
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
Complex Analysis - Differentiability and Analyticity (Team 2) - University of...
Complex Analysis - Differentiability and Analyticity (Team 2) - University of...Complex Analysis - Differentiability and Analyticity (Team 2) - University of...
Complex Analysis - Differentiability and Analyticity (Team 2) - University of...
 
An Introduction to Spectral Graph Theory
An Introduction to Spectral Graph TheoryAn Introduction to Spectral Graph Theory
An Introduction to Spectral Graph Theory
 
モンテカルロサンプリング
モンテカルロサンプリングモンテカルロサンプリング
モンテカルロサンプリング
 
Numerical Integration
Numerical IntegrationNumerical Integration
Numerical Integration
 
Special functions
Special functionsSpecial functions
Special functions
 
2値分類・多クラス分類
2値分類・多クラス分類2値分類・多クラス分類
2値分類・多クラス分類
 
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
 
[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs
 
In sight into wavelets from theory to practice , soman k.p. ,ramachandran k.i...
In sight into wavelets from theory to practice , soman k.p. ,ramachandran k.i...In sight into wavelets from theory to practice , soman k.p. ,ramachandran k.i...
In sight into wavelets from theory to practice , soman k.p. ,ramachandran k.i...
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
 
Fundamentals of matlab
Fundamentals of matlabFundamentals of matlab
Fundamentals of matlab
 
graph theory
graph theory graph theory
graph theory
 

Similaire à Spectral Clustering with Motifs and Higher-Order Structures

Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficientsAustin Benson
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsAustin Benson
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIAustin Benson
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficientsAustin Benson
 
Higher-order graph clustering at AMS Spring Western Sectional
Higher-order graph clustering at AMS Spring Western SectionalHigher-order graph clustering at AMS Spring Western Sectional
Higher-order graph clustering at AMS Spring Western SectionalAustin Benson
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Tin180 VietNam
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsAustin Benson
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Snowbird comp-top-may2017
Snowbird comp-top-may2017Snowbird comp-top-may2017
Snowbird comp-top-may2017Mason Porter
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)dnac
 
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Devansh16
 
Microsoft Research, India Social Networks And Their Applications To Web (Ti...
Microsoft Research, India   Social Networks And Their Applications To Web (Ti...Microsoft Research, India   Social Networks And Their Applications To Web (Ti...
Microsoft Research, India Social Networks And Their Applications To Web (Ti...Tin180 VietNam
 
Analytic tools for higher-order data
Analytic tools for higher-order dataAnalytic tools for higher-order data
Analytic tools for higher-order dataAustin Benson
 
Effective community search_dami2015
Effective community search_dami2015Effective community search_dami2015
Effective community search_dami2015Nicola Barbieri
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation LearningJure Leskovec
 

Similaire à Spectral Clustering with Motifs and Higher-Order Structures (20)

Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoI
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 
Higher-order graph clustering at AMS Spring Western Sectional
Higher-order graph clustering at AMS Spring Western SectionalHigher-order graph clustering at AMS Spring Western Sectional
Higher-order graph clustering at AMS Spring Western Sectional
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Snowbird comp-top-may2017
Snowbird comp-top-may2017Snowbird comp-top-may2017
Snowbird comp-top-may2017
 
08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...
 
Microsoft Research, India Social Networks And Their Applications To Web (Ti...
Microsoft Research, India   Social Networks And Their Applications To Web (Ti...Microsoft Research, India   Social Networks And Their Applications To Web (Ti...
Microsoft Research, India Social Networks And Their Applications To Web (Ti...
 
Analytic tools for higher-order data
Analytic tools for higher-order dataAnalytic tools for higher-order data
Analytic tools for higher-order data
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Effective community search_dami2015
Effective community search_dami2015Effective community search_dami2015
Effective community search_dami2015
 
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
An Introduction to Networks
An Introduction to NetworksAn Introduction to Networks
An Introduction to Networks
 

Plus de David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 

Plus de David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 

Dernier

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 

Dernier (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 

Spectral Clustering with Motifs and Higher-Order Structures

  • 1. Spectral graph clustering with motifs and higher-order structures David F. Gleich Purdue University Code & Data github.com/arbenson/higher-order-organization-julia github.com/dgleich/motif-ssbm 9 10 8 7 2 0 4 3 11 6 5 1 Austin Benson (Stanford -> Cornell) Jure Leskovec (Stanford) NAConf'17David Gleich · Purdue 1
  • 2. Graphs and matrices have a long and intertwined history. Matrices and graphs represent relationships among a group of objects. To study the relationships • centrality • reachability • clustering • … and more … often use matrix computations • e.g. Estrada & Higham, SIREV • e.g. Network analysis, Brandes & Erlebach Helen Bott, Observation of play activities in a nursery school, 1928 Ax = b Ax = x
  • 3. … a suggestion based on our work …
  • 4. given a graph G = (V, E) and its adjacency matrix A consider using the weighted matrix W = A2 A Hadamard / element-wise
  • 5. and its non-symmetric adjacency matrix A consider using a symmetric weighted matrix from given a directed graph G = (V, E) Motif Matrix computations W = M1 C = (U · U) UT C + CT M2 C = (B · U) UT + (U · B) UT + (U · U) B C + CT M3 C = (B · B) U + (B · U) B + (U · B) B C + CT M4 C = (B · B) B C M5 C = (U · U) U + (U · UT ) U + (UT · U) U C + CT M6 C = (U · B) U + (B · UT ) UT + (UT · U) B C M7 C = (UT · B) UT + (B · U) U + (U · UT ) B C M8 C = (U · N) U + (N · UT ) UT + (UT · U) N C
  • 6. Triangles in social relationships. Simmel, 1908 Rapoport, 1953 Granovetter, 1973 what why how (A*A).*A Matlab (A*A).*A Julia np.dot(A,A)*A Python (A is array) When clustering based on triangles, we often have better numerical properties (e.g. eigenvalue gaps) in model partitioning problems (stochastic block models) and better real-world results. The matrix W = A2 A arises from our motif and higher-order clustering framework when using triangles as the motif.
  • 8. … a little story …
  • 9. Networks are sets of nodes and edges (graphs) that model real world systems Key insight. [Flake et al., Newman et al., and hundreds more!] Networks—for real-world systems—have modules, communities, clusters This structure has traditionally been exposed with node and edge based clustering metrics. Density, modularity, conductance, cut, ratio cuts, etc. NAConf'17David Gleich · Purdue 9 Co-author network 8 Background network clustering is a fundamental network analysis for finding coherent groups of nodes based on edges § Real-world networks have modular organization [Newman 2004, Newman 2006]. § We want to automatically find the modules in the system. Co-author network § Old idea Find groups of nodes with high internal edge density and low external edge density [Newman 2004, Danon 2005, Leskovec+ 2009]. Brain network, de Reus et al., RSTB, 2014. Brain network, de Reus et al., RSTB, 2014.
  • 10. Similar tools are used to partition computations for parallelism Comanche mesh from Alex Pothen, from Sparse Matrix Collection NAConf'17David Gleich · Purdue 10
  • 11. There is abundant evidence that higher-order connectivity patterns drive complex systems. NAConf'17David Gleich · Purdue 11 4 es order drive Mangan et al., 2003 Alon, 2007 Signed feed-forward loops in genetic transcription.A C B CC A C B A B C ks. 4 Thinking beyond nodes and edges There is abundant evidence that higher-order connectivity patterns, or network motifs, drive complex systems [Milo+02, Yaveroğlu+14]. Mangan et al., 2003 Alon, 2007 Signed feed-forward loops in genetic transcription.A B A C B A C B A B A B C Triangles in social relationships. Simmel, 1908 Rapoport, 1953 Granovetter, 1973 Bi-directed length-2 paths in brain networks. Sporns-Kötter, 2004 Sporns et al., 2007 Honey et al., 2007 4 Thinking beyond nodes and edges There is abundant evidence that higher-order connectivity patterns, or network motifs, drive complex systems [Milo+02, Yaveroğlu+14]. Mangan et al., 2003 Alon, 2007 Signed feed-forward loops in genetic transcription.A B A C B A C B A B A B C Triangles in social relationships. Simmel, 1908 Rapoport, 1953 Granovetter, 1973 Bi-directed length-2 paths in brain networks. Sporns-Kötter, 2004 Sporns et al., 2007 Honey et al., 2007 4 Thinking beyond nodes and edges There is abundant evidence that higher-order connectivity patterns, or network motifs, drive complex systems [Milo+02, Yaveroğlu+14]. Mangan et al., 2003 Alon, 2007 Signed feed-forward loops in genetic transcription.A B A C B A C B A B A B C Triangles in social relationships. Simmel, 1908 Rapoport, 1953 Granovetter, 1973 Bi-directed length-2 paths in brain networks. Sporns-Kötter, 2004 Sporns et al., 2007 Honey et al., 2007 Simmel, 1908 Rapoport, 1953 Granovetter, 1973 Sporns-Kötter, 2004 Sporns et al., 2007 Honey et al., 2007 Mangan et al., 2003 Alon, 2007 Triangles in social networks Bi-directed paths in brain networks Signed feed-forward loops in genetic transcription Key Insight. [Milo et al. (Science 2002)] Certain subgraphs were far more common than expected. We call any small subgraph a motif.
  • 12. Nodes and edges may not be the basis elements of these networks. Why should we look for module structure in terms of nodes and edges? NAConf'17David Gleich · Purdue 12
  • 13. Idea Find clusters of motifs NAConf'17David Gleich · Purdue 13
  • 14. Higher-order organization of complex networks We generalize spectral clustering, a classic technique to find clusters or communities in a graph, to use motifs to cluster the graph. • Uses motif conductance instead of node & edge conductance • We also bound the conductance in terms of the optimal solution Outline 1. So we’ll briefly review how spectral clustering works 2. Then see how to adapt it to work with network motifs 3. Then see this procedure on real-world & model data
  • 15. We can do motif-based clustering by generalizing spectral clustering Spectral clustering is a classic technique to partition graphs by looking at eigenvectors. M. Fiedler, 1973, Algebraic connectivity of graphs Graph Laplacian Eigenvector NAConf'17David Gleich · Purdue 15 Earlier work by Simon, Ando, Courtois dealt with a related decomposability idea eigenvalueentry A L = D 1/2 (D A)D 1/2 (D A)x = Dx
  • 16. Spectral clustering works based on conductance with node and edge cuts NAConf'17David Gleich · Purdue 16 Conductance is one of the most important quality scored used to identify network modules, clusters or communities [Schaeffer 2007] used in Markov chain theory, bioinformatics, vision, etc. (edges leaving the set) (total edges in the set) (S) = cut(S) min vol(S), vol( ¯S) S S vol(S) = P i2S degree of i (conductance) cut(S) = # edges between S, ¯S
  • 17. Spectral clustering works based on conductance with node and edge cuts NAConf'17David Gleich · Purdue 17 Conductance is one of the most important quality scored used to identify network modules, clusters or communities [Schaeffer 2007] used in Markov chain theory, bioinformatics, vision, etc. (edges leaving the set) (total edges in the set) (S) = cut(S) min vol(S), vol( ¯S) S S (conductance) cut(S) = 7 cut( ¯S) = 7 |S| = 15 | ¯S| = 20 vol(S) = 85 vol( ¯S) = 151 cut(S) = 7 cut( ¯S) = 7 |S| = 15 | ¯S| = 20 vol(S) = 85 vol( ¯S) = 151 (S) = 7/85 = 0.082 Small conductance ó Good set
  • 18. Spectral clustering has theoretical guarantees Cheeger Inequality Finding the best conductance set is NP-hard. L • Cheeger realized the eigenvalues of the Laplacian provided a bound in manifolds • Alon and Milman independently realized the same thing for a graph! J. Cheeger, 1970, A lower bound on the smallest eigenvalue of the Laplacian N. Alon, V. Milman 1985. λ1 isoperi- metric inequalities for graphs and superconcentrators Laplacian 2 ⇤/2  2  2 ⇤ 0 = 1  2  ...  n  2 Eigenvalues of the Laplacian ⇤ = set of smallest conductance NAConf'17David Gleich · Purdue 18
  • 19. The sweep cut algorithm realizes the guarantee We can find a set S that achieves the Cheeger bound. 1. Compute the eigenvector associated with λ2 (e.g. ARPACK) 2. Sort the vertices by their values in the eigenvector: σ1, σ2, … σn 3. Let Sk = {σ1, …, σk} and compute the conductance of each Sk: φk = φ(Sk) 4. Pick the minimum φm of φk . M. Mihail, 1989 Conductance and convergence of Markov chains F. C. Graham, 1992, Spectral Graph Theory. NAConf'17David Gleich · Purdue 19 m  2 p ⇤
  • 20. The sweep cut visualized 0 20 40 0 0.2 0.4 0.6 0.8 1 S i φi (S) = cut(S) min vol(S), vol( ¯S) NAConf'17David Gleich · Purdue 20
  • 21. But current problems are much more rich than where spectral is justified Spectral clustering is theoretically justified for undirected graphs • Various extensions to multiple clusters [Dhillon et al.; Gharan et al.; Jordan et al.] • Weighted graphs are okay • Approximate eigenvectors are okay [Mihail] Current network models are more richly annotated • directed, signed, colored, layered, multiplex, etc. R. Milo, 2002, Science X causes Y to be expressed Z represses Y X Z Y + – NAConf'17David Gleich · Purdue 21 Nice recent work by [Fairbanks et al. arXiv] on better numerical stopping criteria!
  • 22. There is a literature on directed spectral graph partitioning, but it is hard to interpret Markov chains • Stewart (numerical solution to Markov chains) • Chung (Random walks and cuts in dir graphs ) Nonlinear Laplacian • Yoshida WSDM2016 Asymmetric Laplacian • Boley et al. LAA2011 (commute times) Gleich, Klymko, Kolda ASE BigData 2014 D 1 Ax = x (D A)x = x 1 2 ⇧(D 1 A) + 1 2 (AT D 1 )⇧ X (u,v)2E ( (xu xv )2 xu xv 0 0 otherwise NAConf'17David Gleich · Purdue 22
  • 23. Our contributions 1. A generalized conductance metric for motifs 2. A “new” spectral clustering algorithm to minimize the generalized conductance. 3. AND an associated Cheeger inequality. (which handles directed graphs) 4. Aquatic layers in food webs 5. Hub structure in transportation This talk, still preliminary! NAConf'17David Gleich · Purdue 23 Some studies in stochastic block modelsNew!
  • 24. Motif-based conductance generalizes edge-based conductance Need notions of cut and volume S S S¯S ¯S vol(S) = #(edge end points in S) NAConf'17David Gleich · Purdue 24 cut(S) = #(edges cut by S) cutM (S) = #(motifs cut by S) volM (S) = #(motif end points in S) M (S) = cutM (S) min(volM (S), volM ( ¯S)) (S) = cut(S) min(vol(S), vol( ¯S)) vol(S) = P i2S degree of i
  • 25. An example of motif-conductance 9 10 6 5 8 1 7 2 0 4 3 11 9 10 8 7 2 0 4 3 11 6 5 1 ¯S S Motif M (S) = motifs cut motif volume = 1 10 NAConf'17David Gleich · Purdue 25
  • 26. How can we optimize motif conductance? We thought that motif conductance would spark new tensor and hypermatrix methods based on the motif adjacency tensor. NAConf'17David Gleich · Purdue 26 1 3 2 A We were wrong! A(i, j, k) = ( 1 if motif involves nodes i, j, k 0 otherwise Benson, Gleich, Leskovec, SDM 2016
  • 27. There is a symmetric matrix that serves as the appropriate tool to study motif conductance 9 10 6 5 8 1 7 2 0 4 3 11 9 10 6 5 8 1 7 2 0 4 3 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 A W(M) ij = counts co-occurrences of motif pattern between i, j W(M) NAConf'17David Gleich · Purdue 27
  • 28. Going from motifs back to a matrix for spectral clustering 9 10 6 5 8 1 7 2 0 4 3 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 W(M) ij = counts co-occurrences of motif pattern between i, j W(M) KEY INSIGHT Spectral clustering on W(M) yields results on the new motif notion of conductance M (S) = motifs cut motif volume = 1 10 NAConf'17David Gleich · Purdue 28
  • 29. Here is a quick illustration of how this works. NAConf'17David Gleich · Purdue 29 9 10 6 5 8 1 7 2 0 4 3 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 9 10 8 7 2 0 4 3 11 6 5 1 M (S) = motifs cut motif volume = 1 10 cut(S) = 2 vol(S) = 6 + 8 + 2 + 2 + 2 = 1 10
  • 30. A motif-based clustering algorithm 1. Form weighted graph W(M) 2. Compute the Fiedler vector associated with λ2 of the motif-normalized Laplacian 3. Run a sweep cut on f 9 10 6 5 8 1 7 2 0 4 3 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 W(M) D = diag(W(M) e) L(M) = D 1/2 (D W(M) )D 1/2 L(M) z = 2z f(M) = D 1/2 z NAConf'17David Gleich · Purdue 30
  • 31. The sweep cut results 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 1 2 0 4 3 1 2 0 4 3 9 10 6 Best higher- order cluster 2nd best higher- order cluster 9 10 6 5 8 1 7 2 0 4 3 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 (Order from the Fiedler vector) NAConf'17David Gleich · Purdue 31
  • 32. There are nice matrix computations for three-node motifs NAConf'17David Gleich · Purdue 32 W = A2 A 4 Thinking beyond nodes and edges There is abundant evidence that higher-order connectivity patterns, or network motifs, drive complex systems [Milo+02, Yaveroğlu+14]. Mangan et al., 2003 Alon, 2007 Signed feed-forward loop in genetic transcription.A B A B A C B D A B A B C Figure 1: Higher-order network str framework. A: Higher-order structur 13 connected three-node directed motif Triangles in social relationships. Simmel, 1908 Rapoport, 1953 Granovetter, 1973 Bi-directed length-2 paths in brain networks. Sporns-Kötter, 2004 Sporns et al., 2007 Honey et al., 2007
  • 33. There are nice matrix computations for three-node directed motifs Given a (directed) adjacency matrix A, let B = A AT and U = A B bidirectional unidirectional Motif Matrix computations W = M1 C = (U · U) UT C + CT M2 C = (B · U) UT + (U · B) UT + (U · U) B C + CT M3 C = (B · B) U + (B · U) B + (U · B) B C + CT M4 C = (B · B) B C M5 C = (U · U) U + (U · UT ) U + (UT · U) U C + CT M6 C = (U · B) U + (B · UT ) UT + (UT · U) B C M7 C = (UT · B) UT + (B · U) U + (U · UT ) B C M8 C = (U · N) U + (N · UT ) UT + (UT · U) N C N = ee T B U U T NAConf'17David Gleich · Purdue 33
  • 34. The three-node motif-based Cheeger inequality THEOREM If the motif has three nodes, then the sweep procedure on the weighted graph finds a set S of nodes for which M(G) = {instances of M in G} Key Proof Step NAConf'17David Gleich · Purdue 34 cutM (S, G) = X {i,j,k}2M(G) Indicator[xi , xj , xk not the same] = 1 4 (x2 i + x2 j + x2 k xi xj xj xk xi xk ) = quadratic in x M (S)  2 q ⇤ M IMPLICATION Just run spectral clustering on those weighted matrices.
  • 35. Awesome advantages Works for arbitrary non-neg. combos of motifs too We inherit 40+ years of research! • Fast algorithms (ARPACK, etc.)! • Local methods! Yin, Benson, Leskovec, Gleich, KDD2017 • Overlapping! • Easy to implement (20 lines of Matlab/Julia) • Scalable (1.4B edges graphs are not a prob.) NAConf'17David Gleich · Purdue 35 17 elseif motif == "M5" 18 C = (U * U) .* U + (U * U’) .* U + (U’ * U) .* U 19 W = C + C’ 20 elseif motif == "M6" 21 W = (U * B) .* U + (B * U’) .* U’ + (U’ * U) .* B 22 elseif motif == "M7" 23 W = (U’ * B) .* U’ + (B * U) .* U + (U * U’) .* B 24 else 25 error("Motif must be one of M1, M2, M3, M4, M5, M6, or M7.") 26 end 27 28 # Get Fiedler eigenvector 29 dinvsqrt = spdiagm(1.0 ./ sqrt.(vec(sum(W, 1)))) 30 LM = I - dinvsqrt * W * dinvsqrt 31 lambdas, evecs = eigs(LM, nev=2, which=:SM) 32 z = dinvsqrt * real(evecs[:, 2]) 33 34 # Sweep cut 35 sigma = sortperm(z) 36 C = W[sigma, sigma] 37 Csums = sum(C, 1)’ 38 motifvolS = cumsum(Csums) 39 motifvolSbar = sum(W) * ones(length(sigma)) - motifvolS 40 conductances = cumsum(Csums - 2 * sum(triu(C), 1)’) ./ min.(motif 41 split = indmin(conductances) 42 if split <= length(size(A, 1) / 2) 43 return sigma[1:split] 44 else 45 return sigma[(split + 1):end] 46 end 47 end Figure 2.3 – Julia implementation of the motif-based spectral clusteri
  • 36. Case study 1 Motifs partition the food webs Food webs model energy exchange in species of an ecosystem. means i’s energy goes to j (or j eats i) NAConf'17David Gleich · Purdue 36 i j
  • 37. Case study 1 Motifs partition the food webs Food webs model energy exchange in species of an ecosystem. means i’s energy goes to j (or j eats i) Via Cheeger, motif conductance is better than edge conductance. NAConf'17David Gleich · Purdue 37 i j
  • 38. Demo and reproducibility https://github.com/arbenson/higher-order-organization-julia NAConf'17David Gleich · Purdue 38 # form W0 … W4 sc0 = spectral_cut(W0) sc1 = spectral_cut(W1) sc2 = spectral_cut(W2) sc3 = spectral_cut(W3) sc4 = spectral_cut(W4) plt = x -> semilogx(x.sweepcut_profile .conductance) plt(sc0) plt(sc1) plt(sc2) plt(sc3) plt(sc4)
  • 39. Case study 1 Motifs partition the food webs NAConf'17David Gleich · Purdue 39 B D Micronutrient sources Pelagic fishes and benthic prey Benthic macro- invertebrates Benthic Fishes Motif M6 reveals aquatic layers A 61% accuracy vs. 48% with edge- based methods 24 Application 1 Food webs
  • 40. Case study 2 Hub structure in the air transportation network North American air transport network Nodes are airports Edges reflect reachability, and are unweighted. (Based on Frey et al.’s 2007) NAConf'17David Gleich · Purdue 40
  • 41. The weighed adjacency matrix already reveals hub-like structure NAConf'17David Gleich · Purdue 41 Accepted pending B A Counts length-two walks
  • 42. The motif embedding shows this structure and splits into east-west Top 10 U.S. hubs East coast non-hubs West coast non-hubs Primary spectral coordinate Atlanta, the top hub, is next to Salina, a non-hub. MOTIF SPECTRAL EMBEDDING EDGE SPECTRAL EMBEDDING NAConf'17David Gleich · Purdue 42
  • 43. Case study 3: the stochastic block model shows numerical advantages to motif-matrices Model problems are useful in because they are simple and we often “know” everything about them. They may not reflect real-world issues. Mouse picture from Wikipedia Мышь_2.jpg, Fly from oregonstateuniversity/11179958483 Biology Matrix Computations Clustering r2 u = f NAConf'17David Gleich · Purdue 43
  • 44. The stochastic block model is extremely well understood in theory The symmetric stochastic block model (SSBM) • k blocks, each of size m-by-m • within-block edges exist with prob p • between-block edges with prob q symmetric stochastic block model m = 200, k = 5, p = 0.3, q = 0.13 m m m 2 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 5 m p q · · · q m q p ... ... ... m q q p . Reminescnt of Simon & Ando. NAConf'17David Gleich · Purdue 44
  • 45. The stochastic block model is extremely well understood in theory The symmetric stochastic block model (SSBM) • k blocks, each of size m-by-m • within-block edges exist with prob p • between-block edges with prob q m m m 2 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 5 m p q · · · q m q p ... ... ... m q q p . The task Given a graph that is an SSBM and given m, k, p, q Find the k blocks. Theory E. Abbe, community detection and the stochastic block model (In prep, on webpage) • Necessary p > q • Exact recovery (get all correct) • Detectability (find a non-trivial portion) • Uses non-backtracking random walk. m = 200, k = 5, p = 0.3, q = 0.13 NAConf'17David Gleich · Purdue 45
  • 46. 4 Thinking beyond nodes and edges There is abundant evidence that higher-order connectivity patterns, or network motifs, drive complex systems [Milo+02, Yaveroğlu+14]. Mangan et al., 2003 Alon, 2007 Signed feed-forward loops in genetic transcription.A C B D Figure S8: Higher-order organization of the S. cerevisiae transcriptional regulation network. A: The four higher-order structures used by our higher-order clustering method, which can model signed motifs. These are coherent feedfoward loop motifs, which act as sign-sensitive delay elements in transcriptional regulation networks (46). The edge signs refer to activation (positive) or repression (negative). B: Six higher-order clusters revealed by the motifs in (A). Clusters show functional modules consisting of several motifs (coherent feedforward loops), which were previously studied individually (45). The higher-order clustering framework identi- fies the functional modules with higher accuracy (97%) than existing methods (68–82%). C–D: Two higher-order clusters from (B). In these clusters, all edges have positive sign. The func- tionality of the motifs in the modules correspond to drug resistance (C) or cell cycle and mating type match (D). The clustering suggests that coherent feedforward loops function together as a single processing unit rather than as independent elements. S48 A C B D Figure S8: Higher-order organization of the S. cerevisiae transcriptional regulation network. A: The four higher-order structures used by our higher-order clustering method, which can model signed motifs. These are coherent feedfoward loop motifs, which act as sign-sensitive delay elements in transcriptional regulation networks (46). The edge signs refer to activation (positive) or repression (negative). B: Six higher-order clusters revealed by the motifs in (A). Clusters show functional modules consisting of several motifs (coherent feedforward loops), which were previously studied individually (45). The higher-order clustering framework identi- fies the functional modules with higher accuracy (97%) than existing methods (68–82%). C–D: Two higher-order clusters from (B). In these clusters, all edges have positive sign. The func- tionality of the motifs in the modules correspond to drug resistance (C) or cell cycle and mating type match (D). The clustering suggests that coherent feedforward loops function together as a single processing unit rather than as independent elements. S48 A C B D Figure S8: Higher-order organization of the S. cerevisiae transcriptional regulation network. A: The four higher-order structures used by our higher-order clustering method, which can model signed motifs. These are coherent feedfoward loop motifs, which act as sign-sensitive delay elements in transcriptional regulation networks (46). The edge signs refer to activation (positive) or repression (negative). B: Six higher-order clusters revealed by the motifs in (A). Clusters show functional modules consisting of several motifs (coherent feedforward loops), which were previously studied individually (45). The higher-order clustering framework identi- fies the functional modules with higher accuracy (97%) than existing methods (68–82%). C–D: Two higher-order clusters from (B). In these clusters, all edges have positive sign. The func- tionality of the motifs in the modules correspond to drug resistance (C) or cell cycle and mating type match (D). The clustering suggests that coherent feedforward loops function together as a single processing unit rather than as independent elements. S48 A C B D Figure S8: Higher-order organization of the S. cerevisiae transcriptional regulation network. A: The four higher-order structures used by our higher-order clustering method, which can model signed motifs. These are coherent feedfoward loop motifs, which act as sign-sensitive delay elements in transcriptional regulation networks (46). The edge signs refer to activation (positive) or repression (negative). B: Six higher-order clusters revealed by the motifs in (A). Clusters show functional modules consisting of several motifs (coherent feedforward loops), which were previously studied individually (45). The higher-order clustering framework identi- fies the functional modules with higher accuracy (97%) than existing methods (68–82%). C–D: Two higher-order clusters from (B). In these clusters, all edges have positive sign. The func- tionality of the motifs in the modules correspond to drug resistance (C) or cell cycle and mating type match (D). The clustering suggests that coherent feedforward loops function together as a single processing unit rather than as independent elements. S48 A B C Figure 1: Higher-order network structures and the higher-order network clustering framework. A: Higher-order structures are captured by network motifs. For example, all 13 connected three-node directed motifs are shown here. B: Clustering of a network based on motif M7. For a given motif M, our framework aims to find a set of nodes S that minimizes motif conductance, M (S), which we define as the ratio of the number of motifs cut (filled triangles cut) to the minimum number of nodes in instances of the motif in either S or ¯S (13). In this case, there is one motif cut. C: The higher-order network clustering framework. Given a graph and a motif of interest (in this case, M7), the framework forms a motif adjacency matrix (WM ) by counting the number of times two nodes co-occur in an instance of the motif. An eigenvector of a Laplacian transformation of the motif adjacency matrix is then computed. The ordering of the nodes provided by the components of the eigenvector (15) produces nested sets Sr = { 1, . . . , r} of increasing size r. We prove that the set Sr with the smallest motif-based conductance, M (Sr), is a near-optimal higher-order cluster (13). 7 Triangles in social relationships. Simmel, 1908 Rapoport, 1953 Granovetter, 1973 Bi-directed length-2 paths in brain networks. Sporns-Kötter, 2004 Sporns et al., 2007 Honey et al., 2007 Our study is to look at SSBM model using our motif-weighting based on triangles. 4 Thinking beyond nodes and edges There is abundant evidence that higher-order connectivity patterns, or network motifs, drive complex systems [Milo+02, Yaveroğlu+14]. Mangan et al., 2003 Alon, 2007 Signed feed-forward loops in genetic transcription.A C B D A C B D A C B D A C B D A B C Figure 1: Higher-order network structures and the Triangles in social relationships. Simmel, 1908 Rapoport, 1953 Granovetter, 1973 Bi-directed length-2 paths in brain networks. Sporns-Kötter, 2004 Sporns et al., 2007 Honey et al., 2007 W = A2 A NAConf'17David Gleich · Purdue 46 Based also on Tsourakakis, Pachocki, & Mitzenmacher. WWW, 2017 who showed -conductance < edge-conductance in a model.
  • 47. Just using the motif-weighting highlights the blocks for a range of parameters. We introduce a mixing parameter μ to scale q. µ = 0 $ q = 0, µ = k 1 k $ q = p W = A2 AA NAConf'17David Gleich · Purdue 47
  • 48. The power method identifies a cluster using the motif weighting better than the adjacency Detectability Exact recovery Exp. details. We take the normalized Lap and shift to reverse the spectrum. Then we deflate given knowledge of the leading eigenvector. Accuracy is the most accurate block in the extremal m entries AccuracyW = A2 AA
  • 49. The power method identifies a cluster using the motif weighting better than the adjacency W=A2 A A NAConf'17David Gleich · Purdue 49
  • 50. We don’t converge faster for the usual reasons that the power method converges NAConf'17David Gleich · Purdue 50
  • 51. There is a bigger gap deeper in the spectrum, that could explain what is going on NAConf'17David Gleich · Purdue 51
  • 52. The motif weighting shifts all the eigenvalues down, but lowest drop the most. Semi-circle law Not a Marchenko- Pastur law! W = A2 AA
  • 53. We’d like a numerical understanding of why we get better results faster with motifs. Eigenvalues show that we’ll converge to the “cluster subspace” faster. Conjecture. Higher accuracy for motifs because the eigenvectors are more localized—or sharper—around the clusters. • What remains is to understand why they are sharper! W = A2 AA
  • 54. Related work. • Laplacian we propose was originally proposed by Rodríguez [2004] and again by Zhou et al. [2006] Our new theory (motif Cheeger inequality) explains why these were good ideas. • Falls under general strategy of encoding hypergraph partitioning problem as graph clustering problem [Agarwal+ 06] • Serrour, Arenas, & Gómez, Detecting communities of triangles in complex networks using spectral optimization, 2011. • Arenas et al., Motif-based communities in complex networks, 2008. • Rohe & Qin, Blessing of transitivity …, arXiv, 2013. • Klymko, Gleich, Kolda (Using triangles & cycles …, ASE BigData 2014) • Benson, Gleich, Leskovec (Motifs & Tensors, SIAM Data Mining 2015) NAConf'17David Gleich · Purdue 54
  • 55. Paper Benson, Gleich, Leskovec Science, 2016 1. A generalized conductance metric for motifs 2. A new spectral clustering algorithm to minimize the generalized conductance. 3. AND an associated Cheeger inequality. 4. Aquatic layers in food webs 5. Hub structure in transportation networks 6. Eigenvalues & vectors of motifs in SSBMs. 7. Lots of cool stuff on signed networks. Joint work with Austin Benson and Jure Leskovec, Stanford Supported by NSF CAREER CCF-1149756, IIS-1422918 IIS- DARPA SIMPLEX 9 10 2 0 4 3 6 5 1 NAConf'17David Gleich · Purdue 55 Code & Data snap.stanford.edu/higher-order github.com/arbenson/higher-order-organization-julia github.com/dgleich/motif-ssbm Open questions • What is the distribution law for the Laplacian of A2 ⊙ A • How to work with element- wise prods like matvecs for N = eeT B U UT Thank you!