2. In this talk…
Mathematical theory about filter, activation,
pooling through multi-layers based on DCNN
Encompass general ingredients
Lipschitz continuity & Deformation sensitivity
WARNING : Very tough mathematics
…without non-Euclidean geometry (e.g. Geometric DL)
3. What is Harmonic Analysis?
f(x)=
X
n2N
an n(x), an := hf, niH
How to represent a function efficiently in the
sense of Hilbert space?
Number theory
Signal processing
Quantum mechanics
Neuroscience, Statistics, Finance, etc…
Includes PDE theory, Stochastic Analysis
4. What is Harmonic Analysis?
f(x)=
X
n2N
an n(x), an := hf, niH
How to represent a function efficiently in the
sense of Hilbert space?
Number theory
Signal processing
Quantum mechanics
Neuroscience, Statistics, Finance, etc…
Includes PDE theory, Stochastic Analysis
27. Main Topic in Harmonic Analysis
Linear operator ⇢ Convolution + Multiplier
Invariance vs Discriminability
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
28. Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
Linear operator ⇢ Convolution + Multiplier
Invariance vs Discriminability
29. Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
Linear operator ⇢ Convolution + Multiplier
Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
AkfkH kL[f]kH BkfkH
30. Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
AkfkH kL[f]kH BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
kL[f1] L[f2]kH = kL[f1 f2]kH Akf1 f2kH
i.e. f1 6= f2 ) L[f1] 6= L[f2]
31. Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
AkfkH kL[f]kH BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
k L · · · L| {z }
n-fold
[f]kH Bk L · · · L| {z }
(n-1)-fold
[f]kH · · · Bn
kfkH
32. Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
AkfkH kL[f]kH BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
k L · · · L| {z }
n-fold
[f]kH Bk L · · · L| {z }
(n-1)-fold
[f]kH · · · Bn
kfkH
Banach fixed-point theorem
33. Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
34. Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
35. Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
Lipschitz continuity
ex) ReLU, tanh, sigmoid …
|f(x) f(y)| Ckx yk () krf(x)k C
36. How to control Lipschitz ?
k⇢(L[f])kH N(B, C)kfkH
Theorem
No change in Invariance!
37. k⇢(L[f])kH N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
How to control Lipschitz ?
38. k⇢(L[f])kH N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
BkfkW 1
2
How to control Lipschitz ?
39. k⇢(L[f])kH N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
BkfkW 1
2
How to control Lipschitz ?
40. k⇢(L[f])kH N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
BkfkW 1
2
How to control Lipschitz ?
41. k⇢(L[f])kH N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
BkfkW 1
2
How to control Lipschitz ?
42. k⇢(L[f])kH N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
BkfkW 1
2
How to control Lipschitz ?
43. k⇢(L[f])kH N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
BkfkW 1
2
How to control Lipschitz ?
What about Discriminability?
51. (f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
f 7! Sd/2
n Pn(f)(Sn·)
|k n(Ttf) n(f)|k = O
ktk
Qn
j=1 Sj
!
Theorem
Generalized Scattering Network (Wiatowski, 2015)
52. f 7! Sd/2
n Pn(f)(Sn·)
|k n(Ttf) n(f)|k = O
ktk
Qn
j=1 Sj
!
Theorem
Features become more translation invariant
with increasing network depth
Generalized Scattering Network (Wiatowski, 2015)