Max Welling (http://www.ics.uci.edu/~welling/) describes the how big data, massive simulation and advanced models go together to help us start solving challenging problems. He also describes his links to other computer science disciplines within the DSRC.
2. DS
RC
The Four Paradigms
We have added big data to
computer simulation, experiment
and theory.
Not replaced it…
3. DS
RC
Big Simulation
Computer simulations have
become increasingly complex
(e.g. weather, earthquake models)
The Computational Wall: If a model has hundreds of parameters, how can we:
1) Find the parameter values that match the observations best?
2) Determine if we underfit (model too simple) or overfit (model too complex)?
3) Compare two models?
5. DS
RC
Challenge I
The “posterior probability”
in closed form.
can not be computed
Solution: Markov Chain Monte Carlo Sampling (MCMC)
6. DS
RC
Challenge II
We cannot run MCMC because the likelihood
is not given in closed form (but rather as a simulation)
Solution: Likelihood Free MCMC (or Approximate Bayesian Computation)
Run many simulations
and compare samples
With observations.
Source: Csillery, Katalin, et al.
"Approximate Bayesian
computation (ABC)
in practice."Trends in
ecology & evolution 25.7
(2010): 410-418.
7. DS
RC
Challenge III
We need thousands of simulations to infer the posterior
(infeasible if every simulation takes a day or so)
Ted Meeds
If surrogate ~ log(P) with high
confidence then use surrogate to draw sample.
If not: simulate until enough confidence.
Surrogate of log(P)
Solution: Learn log(P) using Gaussian Process Surrogate functions (GPS)
8. D S Two Kinds of Complex Model
RC
Machine
Learning
Computational
Science
Model Capacity
“Let the model speak”
“Let the data speak”
10. D S Growth in Model Capacity
RC
2020-2050 Human Brain
(N=+/- 100T)
?
Model Capacity over Time
2009: Hinton’s Deep Belief Net
(+/- N=10M)
2013: Google/Y!
(N=+/- 10B)
1943: First NN
(+/- N=10)
1988: NetTalk
(+/- N=20K)
11. D S Deep Learning: Neural Nets Strike
R C Back(again)
1970: NN discredited
(Minsky & Papert)
2 layers
1943: NN invented
(McCulloch & Pitts)
-Model Size: 10B parameters
-Used by: Yahoo!, Google,
Microsoft, Baidu,
IBM, Scyfer
1986: Backpropagation
(Rumelhart, Hinton & Williams )
1995: SVM
(Vapnik)
3 layers
2009: Deep Learning
(Hinton)
many
layers
12. DS
RC
Paradox
Why does model capacity grow exponentially?
Raw Information: O(N)
Predictive Information: log(N)
Noise
?
13. DS
RC
Big Challenges from Industry
Scyfer connects industry to academia:
-inspire academia w/ relevant problems
-deliver ML products to industry
-host student projects
-provide employment for our students
= VALORISATION
What industry needs.
What academics are
interested in.
14. DS
RC
Intelligent Autonomous Systems Lab - UvA
Visual
Analytics
Shimon Whiteson
Leo Dorst
Business
Analytics
Decision
Theory
(Geometric Algebra)
Understand
and decide
(Reinforcement Learning
& Planning)
Joris Mooij
(Causality)
Distributed
Processing
Data
Reasoning
Knowledge
representati
on
Large Scale
Databases
Store and
process
Software
Eng.
System /
Network
Eng.
Analyze
and model
Multimedia
Retrieval
Modeling
and
simulation
Information
Retrieval
Machine
Learning
Ben Kröse
(Ambient Robotics)
Dariu Gavrilla
(Human-aware
Intelligent Systems)
Max Welling
(Machine Learning)
15. DS
RC
Our Future Need
Visual
Analytics
Shimon Whiteson
Leo Dorst
Business
Analytics
Decision
Theory
(Geometric Algebra)
Understand
and decide
(Reinforcement Learning
& Planning)
Joris Mooij
(Causality)
Distributed
Processing
Data
Reasoning
Knowledge
representati
on
Large Scale
Databases
Store and
process
Software
Eng.
System /
Network
Eng.
Analyze
and model
Multimedia
Retrieval
Modeling
and
simulation
Information
Retrieval
Machine
Learning
Ben Kröse
(Ambient Robotics)
Dariu Gavrilla
(Human-aware
Intelligent Systems)
Max Welling
(Machine Learning)