The document describes research on using symbolic regression to infer mathematical models from experimental data. Symbolic regression evolves computer programs that best fit the data, such as equations composed of basic arithmetic operations and functions. The approach is able to recover known models of various physical systems from sample data alone. It can also infer novel models of biological networks and other complex systems directly from experimental measurements. The ability to distill natural laws from data has applications in scientific discovery, engineering design, and other fields.
25. Encoding Equations
Building Blocks: + - * / sin cos exp log … etc
f(x)
sin(x2)
*
x1*sin(x2)
– sin (x1 – 3)*sin(x2)
(x1 – 3)*sin(-7 + x2)
x1 3 +
x2 -7
John Koza, 1992
26. +
× sin
1.2 –
x
x 2
Models: Expression trees Experiments: Data-points
Subject to mutation and selection Subject to mutation and selection
{const,+,-,*,/,sin,cos,exp,log,abs}
Michael D. Schmidt, Hod Lipson (2006)
30. Semi-empirical mass formula
Modeling the binding energy of an atomic nucleus
Inferred Formula:
0.39Z 2 17.29( N Z ) 2
EB 14.83 13.43 A 12.39 A 0 .64
0.26 R2 = 0.99944
A A
Weizsäcker’s Formula:
Z Z 1 A 2Z 2 A, Z
E a Aa A a
B V
23
S a C 13 A R2 = 0.999915
A A
0 Z , N even
A, Z 0 A odd 0
aP
A1 2
0 Z , N odd
40. =
=
Blue Dots = data points, Green Line = regressed fit
41. Symbolic Regression Inferred Time-Delay Model:
dK bK cK S
aK
dt K
dS b c K
aS S S
dt S
Biologist’s Inferred Model: Gurol Suel, et. al., Science 2007
dK K K n K K
k n K K
dt k0 K n
1 K / K S / S
dS S k S
S S S
1 K / k1 1 K / K S / S
p
dt
42. Withheld Test Set #1 Fit
dGt 1582.0 17.3214 St 51
16.7423
dt Gt 18
dSt 114.922 0.3019 Gt 25
3.05
dt St 15
43. Withheld Test Set #2 Fit
dGt 3526.92 21.312 St 54
10.1355
dt Gt 17
dSt 132.271 0.0178 Gt 57
2.9693
dt St 18
44. Withheld Test Set #3 Fit
dGt 5057.1 39.7452 St 46
6.4406
dt Gt 21
dSt 295.426 0.2965 Gt 54
3.871
dt St 20
49. Homework
Circle Elliptic Curve Sphere
3 1
3
2
2 0.5
1
1
y 0 y 0
z 0
y
y
z
-1
-1
-2 -0.5
-2
-3
-3 -1
-5 0 5 -2 -1 0 1 2 -1 -0.5 0 0.5 1
x
x x
x
x
x
x2 + y2 – 16 = 0 x3 + x – y2 – 1.5 = 0 x2 + y2 + z2 – 1 = 0
50.
51. Linear Oscillator
2
dx
H 114.28 * 369.495 * x 2
L 61.591 692.322
dt 22
dx
dx
H 114.28 * 692.322 * x 22
L 61.591* 369.495 * x
dt
dt • Coefficients may have different
scales and offsets each run
52. Pendulum
d
2
H
L 2.42847*cos( )
dt
d
2
H 3.52768*
L 9.43429*cos( )
dt
53. Double Linear Oscillator
2 2
dx dx
H 14.691* x 15.551* x 21.676* x1 x2 8.3808* 2 2.6046* 1
2
1
2
2
dt dt
would be plus for Lagrangian
64. Concluding Remarks
Wired 16.07
“ CorrelationScientific Method]with massive
data, [the
is enough. Faced
is becoming
”
Chris Anderson
obsolete. We can stop looking for models.
The data deluge accelerates our ability to
hypothesize, model, and test.
65. Theoretical physicists are not yet obsolete,
but scientists have taken steps toward
replacing themselves
66. The end of insight
I am worried that we have enjoyed a
brief window in human history where we
could actually understand things, but that
period may be coming to an end.
-- Steve Strogatz