Harmonic Analysis and Deep Learning

Harmonic Analysis
&

Deep Learning
Sungbin Lim

In this talk…
Mathematical theory about ﬁlter, activation,
pooling through multi-layers based on DCNN
Encompass general ingredients
Lipschitz continuity & Deformation sensitivity
WARNING : Very tough mathematics
…without non-Euclidean geometry (e.g. Geometric DL)

What is Harmonic Analysis?
f(x)=
X
n2N
an n(x), an := hf, niH
How to represent a function efﬁciently in the
sense of Hilbert space?
Number theory
Signal processing
Quantum mechanics
Neuroscience, Statistics, Finance, etc…
Includes PDE theory, Stochastic Analysis

Hilbert space & Inner product
Banach space :
Hilbert space :
© Kyung-Min Rho

© Kyung-Min Rho
Banach space :
Normed space + Completeness
Hilbert space :

Banach space :
Hilbert space :
Banach space + Inner product
© Kyung-Min Rho

Banach space :
Hilbert space :
Rd
, L2, Wn
2 , · · ·
Cn
, Lp, Wn
p · · ·
© Kyung-Min Rho

Banach space :
Hilbert space :
Rd
, L2, Wn
2 , · · ·
hu, vi =
dX
k=1
ukvk
hf, giL2
=
Z
f(x)g(x)dx
hf, giW n
2
= hf, giL2 +
nX
k=1
h@k
xf, @k
xgiL2
Cn
, Lp, Wn
p · · ·
© Kyung-Min Rho

Why Harmonic Analysis?
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0

Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding

Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
Decoding

Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
Decoding
Why we prefer polynomial?

Stone-Weierstrass theorem
Polynomial is Universal approximation!
8f 2 C(X), 8" > 0,
9Pn s.t. max
x2X
|f(x) Pn(x)| < "
© Wikipedia

8f 2 C(X),
9Pn s.t. lim
n!1
kf Pnk1 = 0
© Wikipedia

Even we can approximate derivatives!
9Pn s.t. lim
n!1
kf PnkCn ! 0
8f 2 Ck
(X),
© Wikipedia

Universal approximation = {DL, polynomials, Tree,…}
9Pn s.t. lim
n!1
kf PnkCn ! 0
8f 2 Ck
(X),
© Wikipedia

Universal approximation = {DL, polynomials, Tree,…}
But why we do not use polynomial?
9Pn s.t. lim
n!1
kf PnkCn ! 0
8f 2 Ck
(X),
© Wikipedia

Local interpolation works well for low dimension
© S. Mallat

Need " d
points to cover [0, 1]d
at a distance "
© S. Mallat

Need " d
points to cover [0, 1]d
at a distance "
High dimension ⇢ Curse of dimension!
© H. Bölcskei

Universal approximator
= Good feature extractor
?

Universal approximator
= Good feature extractor
…in HIGH dimension!

Nonlinear Feature Extraction
© S. Mallat, © H. Bölcskei

Dimension Reduction ⇢ Invariants
© S. Mallat

Dimension Reduction ⇢ Invariants
How?
© S. Mallat

Main Topic in Harmonic Analysis
Linear operator ⇢ Convolution + Multiplier
Invariance vs Discriminability
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)

Invariance vs Discriminability

Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
AkfkH  kL[f]kH  BkfkH

kL[f1] L[f2]kH = kL[f1 f2]kH Akf1 f2kH
i.e. f1 6= f2 ) L[f1] 6= L[f2]

k L · · · L| {z }
n-fold
[f]kH  Bk L · · · L| {z }
(n-1)-fold
[f]kH  · · ·  Bn
kfkH

k L · · · L| {z }
n-fold
[f]kH  Bk L · · · L| {z }
(n-1)-fold
[f]kH  · · ·  Bn
kfkH
Banach ﬁxed-point theorem

Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform

Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
Lipschitz continuity
ex) ReLU, tanh, sigmoid …
|f(x) f(y)|  Ckx yk () krf(x)k  C

How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Theorem
No change in Invariance!

Proof)
Let ⇢ = ReLU, H = W1
2 . Then
Theorem

Proof)
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2

Proof)
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2
What about Discriminability?

Scale Invariant Feature
Translation Invariant
Stable at Deformation

Scattering Network (Mallat, 2012)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
© H. Bölcskei

Generalized Scattering Network (Wiatowski, 2015)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Gabor frame
Tensor wavelet Directional wavelet
Ridgelet frame Curvelet frame
© H. Bölcskei

(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Linearize symmetries
© S. Mallat

(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Linearize symmetries
“Space folding”, Cho (2014)
© S. Mallat

(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
f 7! Sd/2
n Pn(f)(Sn·)
|k n(Ttf) n(f)|k = O
ktk
Qn
j=1 Sj
!
Theorem

f 7! Sd/2
n Pn(f)(Sn·)
|k n(Ttf) n(f)|k = O
ktk
Qn
j=1 Sj
!
Theorem
Features become more translation invariant
with increasing network depth

© Philip Scott Johnson
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Theorem
F⌧,! = e2⇡i!(x)
f(x ⌧(x))
|k (F⌧,![f]) (f)k|  C(k⌧k1 + k!k1)kfkL2

Theorem
F⌧,! = e2⇡i!(x)
f(x ⌧(x))
|k (F⌧,![f]) (f)k|  C(k⌧k1 + k!k1)kfkL2
Multi-layer convolution linearize Features
i.e. stable to deformations

David Hilbert
Wir müssen wissen.
Wir werden wissen.

Harmonic Analysis and Deep Learning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (17)

Similaire à Harmonic Analysis and Deep Learning

Similaire à Harmonic Analysis and Deep Learning (20)

Dernier

Dernier (20)

Harmonic Analysis and Deep Learning