2. Deep Learning とは
多層 Neural network を使った機械学習法
Neural network の「逆襲」
(Shallow) Neural network Deep Neural Network
3. Deep learning事例:
画像特徴の教師なし学習
12層 DNN
パラメータ数 ∼1010
教師なし学習による自動特徴抽出
入力:YouTube の画像 108 枚
16 core PC x 103 台 x 3日間
「おばあさん細胞」生成か?
Le et al. ICML 2012
Preferred Stimuli in Higher level cellExamples of Training images
4. Deep learning事例:
一般物体認識
IMAGENET Large ScaleVisual Recognition Challenge 2012
1000 カテゴリ x 約1000枚の訓練画像
Convolution Neural Network
Krizhevsky et al. NIPS 2012
SIFT + FVs: 0.26 test err.
CNN: 0.15 test err.
5. Deep learning事例:
Text mining: Deep Generative Model
Bag of Words による Reuters ニュースのデータ
804,414 件の文書
Auto encoder による教師なし学習
Hinton & Salakhutdinov 2006
LegaLeading
Economic
Indicators
European Community
Monetary/Economic
Accounts/
Earnings
Interbank Markets
Government
Borrowings
Di
Ac
Energy Markets
Model$P(document)$
Bag$of$words$
Reuters$dataset:$804,414$$
newswire$stories:$unsupervis
Deep$Genera:ve$Model$
(Hinton & Salakhutdinov, S
2D-LSA Result
Deep Generative Model Result
17. CNN: Architecture の重要性
Network Architecture は学習と同様に重要(Jarret+09, Saxe+10)
アーキテクチャの違いによる性能評価(Caltech-101)
Figure 1. A example of feature extraction stage of the type FCSG−
Rabs − N − PA. An input image (or a feature map) is passed
through a non-linear filterbank, followed by rectification, local
contrast normalization and spatial pooling/sub-sampling.
layer with 4x4 down-sampling is denoted: P4×4
A .
Max-Pooling and Subsampling Layer - P : building lo-
Figure 4. Left: random stage-1 filters, and corresponding optimal inputs that maximize the response of each corresponding complex cell in
a FCSG −Rabs −N −PA architecture. The small asymmetry in the random filters is sufficient to make them orientation selective. Middle:
same for PSD filters. The optimal input patterns contain several periods since they maximize the output of a complete stage that contains
rectification, local normalization, and average pooling with down-sampling. Shifted versions of each pattern yield similar activations.
Right panel: subset of stage-2 filters obtained after PSD and supervised refinement on Caltech-101. Some structure is apparent.
Figure 4. Left: random stage-1 filters, and corresponding optimal inputs that maximize the response of each corresponding complex cell in
a FCSG −Rabs −N −PA architecture. The small asymmetry in the random filters is sufficient to make them orientation selective. Middle:
same for PSD filters. The optimal input patterns contain several periods since they maximize the output of a complete stage that contains
rectification, local normalization, and average pooling with down-sampling. Shifted versions of each pattern yield similar activations.
Right panel: subset of stage-2 filters obtained after PSD and supervised refinement on Caltech-101. Some structure is apparent.
Random
filter
Trained
filter
2 layer + abs
2 layer +mean
1 layer+abs
0.629 0.647
0.196 0.310
0.533 0.548
Random Predictive Sparse Decomp.
abs
18. 視覚野(Ventral pathway)の性質
視覚野: 階層構造を持ち,階層ごとに異なる視覚課題の解決
初期視覚野: 狭い受容野,単純な特徴抽出
Simple Cell,Complex Cellの存在
高次視覚野: 広い受容野,中程度に複雑な特徴に選択的
V1
V2
V4
PITCIT
Ventral Pathway
AIT
TEO
TE
V1
V2
V3 VP
V4 MT VA/V4
PIT
AIT/CIT 8 TF
LIP MST DPL VIP
7a
V3A
V1
V4
V2
IT
Small receptive field
Edge, Line segment
detector
Large receptive field
Face, Complex feature
detector
?
?
19. 初期視覚野の性質
線分やエッジなどの成分に反応
Simple cell: 方位,位相に敏感
Complex cell: 位相には許容的
Complex cell: Simple cel のカスケード接続
Simple Cell
Phase Sensitive
Orientation Selective
Receptive Field
Input Stimulus
Fire Not FireNot Fire
Phase InsensitiveComplex Cell
Receptive Field
Input Stimulus
Fire Not FireFire
V1
V2
V4
PITCIT
Ventral Pathway
AIT
TEO
TE
V1
V4
V2
IT
Small receptive field
Edge, Line segment
detector
Large receptive field
Face, Complex feature
detector
?
?
500
150
~
http://ohzawa-lab.bpe.es.osaka-u.ac.jp/resources/text/
KisokouKoukai2009/Ohzawa2009Koukai04.pdf
35. 疎表現によるデータ記述
= x1 +x2 +x3 +...
y d1 d2 d3
なるべく0に
H =
X
p
yp
X
i
xp
i di
2
+
X
i
kxp
i k1
画像をなるべく
忠実に表現
なるべく多くの
係数を 0 に (LASSO)
画像パッチ {yp} から {di} と {xi
p} を取得可能か?
37. Sparse Representation for
MNIST
60K train, 10K test
Dict.size 512
Linear SVM classification
H =
X
p
yp
X
i
xp
i di
2
+
X
i
kxp
i k1
Eval. Param.
Slide credit: KaiYu
Input Feature Classifier
Sparse Coding
41. Sparse Auto Encoder
Predictive Sparse Decomposition(Ranzato+07)
xp
= f(Wyp
)yp
= Dxp
Sparse Representation {xp}
Input Patchs {yp}
L1-Constraint
min
D,W,x
X
p
kyp
Dxp
k2
+ kxp
f(Wyp
)k2
+
X
i
kxp
i k
Encoder
Decoder
42. Sparseness + Hierarchy?
Hiearachical Sparse Coding (Yu+11)
Deep Belief network (DBN), Deep Boltzman Machine(DBM) (Hinton & Salakhutdinov06)
Hiearchy Representation
Input Patchs {yp}
Level 2 Features
Level 1 Features
EncoderDecoder
EncoderDecoder
EncoderDecoder
43. Sparseness + Hierarchy?
Deep Belief network (DBN), Deep Boltzman Machine(DBM)
(Hinton & Salakhutdinov06)
Hiearchy Representation
Input Patchs {yp}
Level 2 Features
Level 1 Features
Encoder
Encoder
Encoder
Decoder を外せば
NN として動作
44. Sparseness + Hierarchy?
Deep Belief network (DBN), Deep Boltzman Machine(DBM)
(Hinton & Salakhutdinov06)
Hiearchy Representation
Input Patchs {yp}
Level 2 Features
Level 1 Features
Decoder を動作させて
最適特徴を導出
Decoder
Decoder
Decoder