Engineering Data Science Objectives for Social Network Analysis

Engineering Data Science Objective
Functions for Social Network
Analysis
David F. Gleich
Purdue University
With Nate Veldt (Purdue -> Cornell),
Tony Wirth (Melbourne)
Paper arXiv:1903.05246 Code github.com/nveldt/LearnResParams
LLNL 1David Gleich · Purdue

Somewhere too close and very recently…
Application expert. “Hi, I see you work on
clustering. I want to cluster my data …
… what algorithm should I use?”
LLNLDavid Gleich · Purdue 2

The dreaded question for people
who study clustering, community
detection, etc.
“What algorithm should I use?”

Why is this such a hard question?

Journal of Biomedicine and Biotechnology • 2005:2 (2005) 215–225 • DOI: 10.1155/JBB.2005.215
REVIEW ARTICLE
Finding Groups in Gene Expression Data
David J. Hand and Nicholas A. Heard
Department of Mathematics, Faculty of Physical Sciences, Imperial College, London SW7 2AZ, UK
Received 11 June 2004; revised 24 August 2004; accepted 24 August 2004
The vast potential of the genomic insight offered by microarray technologies has led to their widespread use since they were in-
troduced a decade ago. Application areas include gene function discovery, disease diagnosis, and inferring regulatory networks.
Microarray experiments enable large-scale, high-throughput investigations of gene activity and have thus provided the data analyst
with a distinctive, high-dimensional field of study. Many questions in this field relate to finding subgroups of data profiles which are
very similar. A popular type of exploratory tool for finding subgroups is cluster analysis, and many different flavors of algorithms
have been used and indeed tailored for microarray data. Cluster analysis, however, implies a partitioning of the entire data set, and
this does not always match the objective. Sometimes pattern discovery or bump hunting tools are more appropriate. This paper
reviews these various tools for finding interesting subgroups.
INTRODUCTION
Microarray gene expression studies are now routinely
used to measure the transcription levels of an organism’s
genes at a particular instant of time. These mRNA levels
serve as a proxy for either the level of synthesis of pro-
teins encoded by a gene or perhaps its involvement in a
metabolic pathway. Differential expression between a con-
trol organism and an experimental or diseased organism
can thus highlight genes whose function is related to the
experimental challenge.
An often cited example is the classification of cancer
types (Golub et al [1], Alizadeh et al [2], Bittner et al [3],
croarray slide can typically hold tens of thousands of gene
fragments whose responses here act as the predictor vari-
ables (p), whilst the number of patient tissue samples (n)
available in such studies is much less (for the above exam-
ples, 38 in Golub et al, 96 in Alizadeh et al, 38 in Bittner
et al, 41 in Nielsen et al, 63 in Tibshirani et al, and 80 in
Parmigiani et al).
More generally, beyond such “supervised” classifica-
tion problems, there is interest in identifying groups of
genes with related expression level patterns over time or
across repeated samples, say, even within the same classi-
fication label type. Typically one will be looking for coreg-
between neighbouring frequencies; analogously for mi-
croarray data, there is evidence of correlation of expres-
sion of genes residing closely to one another on the chro-
mosome (Turkheimer et al [17]). Thus when we come to
look at cluster analysis for microarray data, we will see
a large emphasis on methods which are computationally
suited to cope with the high-dimensional data.
CLUSTER ANALYSIS
The need to group or partition objects seems funda-
mental to human understanding: once one can identify a
class of objects, one can discuss the properties of the class
members as a whole, without having to worry about indi-
vidual differences. As a consequence, there is a vast litera-
ture on cluster analysis methods, going back at least as far
as the earliest computers. In fact, at one point in the early
1980s new ad hoc clustering algorithms were being devel-
oped so rapidly that it was suggested there should be a
moratorium on the development of new algorithms while
some understanding of the properties of the existing ones
fundamental pro
pairwise similar
such distances i
objects in the da
Cluster anal
based solely on
sist of relatively
Since cluster an
data set, usually
ter. Extensions o
tering, whereby
than one cluster
naturally to such
these ideas (in f
special case of th
rithm) was given
Since the aim
which are simila
how “similarity
clustering this fo
model). In someLLNLDavid Gleich · Purdue 5

Why is this such a hard question?
There are many reasons people want to cluster data
• Help understand it
• Bin items for some downstream process
• …
There are many methods and strategies to cluster data
• Linkage methods from stats
• Partitioning methods
• Objective functions (K-means) and updating algorithms
• …
I can’t psychically intuit what you need from your data!

I don’t like studying clustering…

I don’t like studying clustering…
… so let’s do exactly that.

Let’s do some warm up.
What are the clusters in this graph?

Let’s consult an expert!

Graph clustering seeks“communities”of nodes
Objective
functions
All seek to
balance
High internal densityLow external connectivity
modularity, densest subgraph, maximum
clique, conductance, sparsest cut, etc.

Two objectives at opposite ends of the spectrum
min
cut(S)
`S`
+
cut(S)
`¯S`
Sparsest cut
David Gleich · Purdue 17LLNL

Sparsest cut
Minimize number of edges removed
to partition graph into cliques
Two objectives at opposite ends of the spectrum
Cluster Deletion
min
cut(S)
`S`
+
cut(S)
`¯S`

We show sparsest cut and cluster deletion are two special
cases of the same new clustering framework:
LAMBDACC = λ Correlation Clustering
This framework also leads to
- new connections to other objectives (including modularity!)
- new approximation algorithms (2-approx for cluster deletion)
- several experiments/applications (social network analysis)
- (aside) fast method for LPs w/ metric constraints (for approx. algs)

And now you are thinking…
… is this talk really going to propose
another new method?!??!?

I’m going to advocate for flexible
clustering frameworks, which we can
then engineer to “fit” example data

22
Our framework is
based on correlation
clustering
Edges in a signed
graph indicate
similarity (+)
or dissimilarity (-)
LLNLDavid Gleich · Purdue

i
j
k
Edges can be weighted, but problems
become harder.
w+
ij wjk
w+
ij wjk
23
Our framework is
clustering
Edges in a signed
graph indicate
similarity (+)
LLNLDavid Gleich · Purdue

Our framework is
clustering
Edges in a signed
graph indicate
similarity (+)
i
j
k
Mistake Mistake
Objective: Minimize the weight of “mistakes”
w+
ij wjk
w+
ij wjk
24LLNLDavid Gleich · Purdue

Given G = (V,E), construct signed
graph G’ = (V,E+,E- ), an instance
of correlation clustering
You can use correlation clustering to cluster unsigned
graphs
+
++
–
–
–
+
+ –
To model sparsest cut or cluster
deletion, set resolution parameter
λ ∈ (0,1)
LAMBDACC
1
1
1
1
1
Without weights, unweighted
correlation clustering is the same
as cluster editing

Consider a restriction to two clusters
Positive mistakes: (1 – λ) cut(S)
Negative mistakes: λ |E–| – λ [ |S| |S| – cut(S) ]
Total weight of mistakes =
David Gleich · Purdue 26
S S
cut(S)– λ |S| |S| + λ |E–|
LLNL

This is a scaled version of sparsest cut!
minimize cut(S) `S``¯S` + `E `
constantTwo-cluster LAMBDACC can be written
cut(S) `S``S` < 0 ()
cut(S)
`S``S`
<Note
cut(S)
`S`
+
cut(S)
`S`
= `V`
cut(S)
`S``S`
LLNL

We can write the objective in terms of cuts to get a
relationship with sparsest cut.
The general LAMBDACC objective can be written
THEOREM
Minimizing this objective produces clusters with scaled sparsest
cut at most λ (if they exist). There exists some λ’ such that
minimizing LAMBDACC will return the minimum sparsest cut
partition.
minimize
1
2
kX
i=1
cut(Si)
2
kX
i=1
`Si``Si` + `E `

We show this is
equivalent to LAMBDACC
for the right choice of
λ ≫ (1-λ)
1
1
1
1
1
cluster deletion correlation clustering with infinite
penalties on negative edges
1
1
1
1
1
For large λ,LAMBDACC generalizes cluster deletion
LLNL

1 2 1 4
3
2
4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 6
1 6
4
2
1 6
Degree-weighted LAMBDACC is related to Modularity
Though this does not preserve approximations…
LAMBDACC is a linear function of Modularity
Positive weight: 1 – λdidj
Negative weight: λdidj

Degree-
weighted
Standard
Sparsest Cut Cluster
Deletion
Correlation
Clustering
(Cluster Editing)
Normalized Cut Modularity
1
2m
0 1 0 1
m
m + 1
⇢⇤
⇤
=
1/2
Many other objectives are special cases of LAMBDACC
m = |E|

And now, an answer to one of the
most frequently asked questions in
clustering.
“What method should I use”?

Changing your method (implicitly) changes the value of
λ that you are using.
Lambda
1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
RatiotoLPbound
1
2
3
4
Graclus
Louvain
InfoMap
RMQC
RMC
Dense subgraph regimeSparse cut regime
This figure shows that if you
use one of these algorithms
(Graclus, Louvain, InfoMap,
recursive max-quasi clique,
or recursive max-clique)
then you implicitly
minimize λ-CC for some
choice of λ.
Turns the question
“what method should I use?”into
“what λ should I use?”

Changing your method (implicitly) changes the value of
λ that you are using.
Lambda
1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
RatiotoLPbound
1
2
3
4
Graclus
Louvain
InfoMap
RMQC
RMC
Dense subgraph regimeSparse cut regime
This figure shows that if you
use one of these algorithms
(Graclus, Louvain, InfoMap,
recursive max-quasi clique,
or recursive max-clique)
then you implicitly
minimize λ-CC for some
choice of λ.
Turns the question
“what method should I use?”into
“what λ should I use?”
We wrote an entire SIMODS paper
explaining how we made this figure!
LP bound involves an LP
with 12 billion constraints.

35
How should I set ! for
my new clustering
application?
Can you give me an example
of what you want your
clusters to look like?
I want communities
that look like this!
LambdaCC inspires an approach for learning the“right”
objective function to use for new applications.
David Gleich · Purdue LLNL

The goal is not to reproduce the example clusters.
The goal to find sets with similar properties size and density tradeoffs.

Let’s go back to the figure we just saw
Each clustering traces out a
bowl-shaped curve.
The minimum point on each
curve tells us the ! regime
where the clustering optimizes
LambdaCC.
1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
1
2
3
4
RatiotoLPbound
Graclus
Louvain
InfoMap
RMQC
RMC
LLNL

0.13 0.17 0.25 0.5
1
1.2
1.4
1.6
1.8
2
RatiotoLPbound
So the “example” clustering
will also correspond to some
type of curve.
0.13 0.17 0.25 0.5
1
1.2
1.4
1.6
1.8
2
RatiotoLPbound
LLNL

0.13 0.17 0.25 0.5
1
1.2
1.4
1.6
1.8
2
RatiotoLPbound
As will any other clustering.
0.13 0.17 0.25 0.5
1
1.2
1.4
1.6
1.8
2
RatiotoLPbound
0.13 0.17 0.25 0.5
1
1.2
1.4
1.6
1.8
2
RatiotoLPbound
LLNL

0.13 0.17 0.25 0.5
1
1.2
1.4
1.6
1.8
2
RatiotoLPbound
0.13 0.17 0.25 0.5
1
1.2
1.4
1.6
1.8
2
RatiotoLPbound
LLNL

Strategy.
Start with a fixed “good”
clustering example.
Find the minimizer for its curve,
to get a ! that is designed to
produce similar clusterings!
Challenge.
We want to do this without
computing the entire curve.
0.13 0.17 0.25 0.5
1
1.2
1.4
1.6
1.8
2
RatiotoLPbound
This is a new optimization
problem where we are
optimizing over !!
LLNL

1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
1
2
3
4
RatiotoLPbound
What function is tracing out these curves?
David Gleich · Purdue
Score for a clustering C.
A linear function in λ.
LambdaCC LP bound for fixed λ.
A parametric LP: concave,and
piecewise linear in λ (Adler & Montiero 1992).
PC( ) =
FC( )
G( )
44
PC( ) =
FC( )
G( )
The “parameter fitness function.”
LLNL

We prove two useful properties about P
Since FC is linear and G is concave and piecewise linear:P satisﬁes the following two properties:
1. If < < +, then P( )  max {P( ), P( +)}.
2. If P( ) = P( +), then P achieves its minimum in [ , +].
Translation…
1. Once P goes up, it can’t go
back down
LLNL

Since FC is linear and G is concave and piecewise linear:P satisﬁes the following two properties:
1. If < < +, then P( )  max {P( ), P( +)}.
Translation…
1. Once P goes up, it can’t go
back down
2. There are no “flat” regions
where we might get stuck
We prove two useful properties about P
LLNL

P satisﬁes the following two properties:
1. If < < +, then P( )  max {P( ), P( +)}.
1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
1
2
3
4
RatiotoLPbound
We know the
minimizer can’t
be to the left of
this point
47
This allows us to minimize P without seeing all of it
LLNL

1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
1
2
3
4
RatiotoLPbound
1. If < < +, then P( )  max {P( ), P( +)}.
We know the
minimizer can’t
be to the left of
this point
48
So this is
possible.
LLNL

1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
1
2
3
4
RatiotoLPbound
1. If < < +, then P( )  max {P( ), P( +)}.
We know the
minimizer can’t
be to the left of
this point
49
So this is
possible.
But so is this.
LLNL

1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
1
2
3
4
RatiotoLPbound
1. If < < +, then P( )  max {P( ), P( +)}.
We know the
minimizer can’t
be to the left of
this point
Evaluate P at a
new point
50
So we’ve ruled out
this possibility!
Now we know the minimizer
can’t be to the right of this one
LLNL

1e-05 0.00022 0.0046 0.1 0.25 0.55 0.85
1
2
3
4
RatiotoLPbound
1. If < < +, then P( )  max {P( ), P( +)}.
If two input ! have
the same fitness
score, the minimizer
is between them.
51
…so it’s not over here.
LLNL

We developed a bisection-like approach for minimizing P
by evaluating it at carefully selected points
One branch scenario: The
minimizer isn’t in [m,r]
Two branch scenario: Evaluate a
couple more points to rule out [m,r]
52LLNL

A simple synthetic test case to demonstrate that having
an example helps.
Nate Veldt 53
Modularity, (a special case of LambdaCC with λ = 1/(2m))
wasn’t able to get the community structure right for
the graph G.
Let’s fix that!
1. Generate a new random graph G’ from the same
distribution
2. Using the ground truth of G’, learn a resolution
parameter !’
3. Cluster G using LambdaCC with ! = !’
We’ve captured the community structure for a specific
class of graphs and can detect the right answer!
G
G’

Nate Veldt
We tested this on a regime of synthetic graphs that is
hard for modularity.
54
Smaller µ à ground truth easier to detect.
For each µ, we train on one
graph, and tested on 5 others.
One example when µ = 0.3
Modularity often fails to separate
ground truth clusters.
“mixing parameter”

We can use this to test if a metadata attribute seems to
be reflected in some characteristic graph structure
S/F Gen Maj. Maj. 2 Res. Yr HS
min Preal 1.30 1.73 2.03 2.12 1.35 1.57 2.11
min Pfake 1.65 1.80 2.12 2.12 2.11 2.09 2.12
<latexit sha1_base64="u3gHB8ZaTnoGowuTl4th1FKapm8=">AAAGKHicdVTNbttGEKaSqE3Vv7g55rKpE7coHJW04do5FDDQwEmAGHUjOUlhCu6SHEpb7S6J3WVtZbGv0AfpudfkGXIrcu2pj9FZUqpF2aVg7nh25vvmj5OUnGkThu87167f6H7w4c2Peh9/8ulnn99a++KFLiqVwnFa8EK9SqgGziQcG2Y4vCoVUJFweJlMf/D3L38DpVkhh2ZWwkjQsWQ5S6lB1ela56s4gTGT1tCk4lQ5m5LWz/ViU5Sq4tDbIINvD8gGeQwS34f01/7i2ELhOWj//88KX08GJI57sWBZ7XgPJUliQc0kpdweudNaZsZiqNyRexsk6m+HxB+723hs9cPmiDyyv9ypL3d2G23k4f8fNadTQNTa5bvGcy+8wLs4oobr4VzpY04KYwpRhx2DzP6ry+mt9bAf1g+5LERzYT2YP0ena9f/ibMirQRIk3Kq9UkUlmZkqTIs5YB1rTSUNJ3SMZygKKkAPbJ1Sx25j5qM5IXCP2lIre0tuyCOorMWyiLW87Y2KYop3mjXa1OafG9kmSwrAzJtGPOKE1MQPygkYwpSw2ekTWumrx+MFS0nDYlh09ecJYqqmY+oONObekJL0JvYk3QzZwbt6ug5GDuscgPPIXPY+OzuXng34Qi7bGEmMFYA0tn68DZn2FJYsUl4Bc7695JFO79hNLK+dj65VgZHwwGVWI1YgYSztBCCYp/jnArGZxnktOLG2VjnC7ldAJ37KXO9+8tkGrOF7Puw/3Azxak0mATl2E0kMOc69xACh5Li92kMqF6M2LE05x5qv3G2+psTHKOdkVvYFpiob/ojwPlRMJiJpOAHmJJtULSzPx4+c1Z6CsGcFc7W9R6AucoYFdmqSzJ3mXN4h0GVaNwklV8QVxOsMgwODn1JFgTDqFU+m5w7q/kFiTduvO1TtPQ1oLycUHcR6i9PV6qejTmwdPKgqf1VN9hojV9Oe/SFh1nushiwsUAm/LZ1pcDD2TgRNm707tJYiGe4TLOrPOYX6IK7IVrdBJeFF7hitvtbP22t7+/Nt8TN4E7wZfB1EAW7wX7wJDgKjoO083vnz86bztvuH9133b+67xvTa525z+2g9XT//heBrBeH</latexit>
(Listen, don’t read!)
For the Caltech network, find the minimum value of lambda for a clustering X induced by a
metadata attribute. Then look at the objective function P(λ,X) = F(λ,X)/G(λ) at the minimizer.
Do this for the real attribute and a randomized attribute (just shuffle the labels); that gives
a null-score where there is no relationship with graph structure.

We can also investigate metadata sets in social
networks. This led to a fun story!
1 1.2 1.4 1.6 1.8
0
0.2
0.4
0.6
0.8
1
How well you do
at finding those
same sets again.
The objective ratio at a minimum, i.e.
how close you get to the lower bound

We can also investigate metadata sets in social
networks. This led to a fun story!
1 1.2 1.4 1.6 1.8
0
0.2
0.4
0.6
0.8
1
2006-2008
2009
How well you do
at finding those
same sets again.
The objective ratio at a minimum, i.e.
how close you get to the lower bound

A quick summary of other work from our research team
on data-driven scientific computing
Our team’s overall goal is to design algorithms and methods tuned to
the evolving needs and nature of scientific data analysis.
Low-rank methods for network alignment – Huda Nassar -> Stanford.
• Principled methods that scale to
aligning thousands of networks.
Spectral properties and generation of realistic
networks – Nicole Eikmeier -> Grinnell College
• “Power-laws” in the top sing. vals of adj matrix are most
robust than degree “power-laws”
• Fast sampling for hypergraph models with higher-order structure.
Local analysis of network data – Meng Liu
• Applications in bioinformatics, software https://github.com/kfoynt/LocalGraphClustering
=
aaa ddd aab
bbb
bdd
Fig. 5. For a Kronecker graph with a 2 ⇥ 2 initi
been “⌦-powered” three times to an 8 ⇥ 8 probability

LLNL
Paper arXiv:1903.05246 (at WWW2019)
Code github.com/nveldt/LearnResParams
(at WWW2018),1806.01678
Software. github: nveldt/LamCC,nveldt/MetricOptimization
60
Don’t ask what algorithm, ask what kind of clusters!
Issues.
• Yeah, this is still slow L
• Needs to be generalized beyond lambda-CC
(ongoing work with Meng Liu at Purdue)
See the paper and code!
With Nate Veldt (Purdue),
Tony Wirth (Melbourne).
Cameron Ruggles (Purdue)
James Saunderson (Monash)

Engineering Data Science Objectives for Social Network Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Engineering Data Science Objectives for Social Network Analysis

Similar to Engineering Data Science Objectives for Social Network Analysis (20)

More from David Gleich

More from David Gleich (12)

Recently uploaded

Recently uploaded (20)

Engineering Data Science Objectives for Social Network Analysis