20130925.deeplearning

Deep Learning による
画像処理
2013/09/25
電気通信大学大学院情報理工学研究科
庄野逸: shouno@uec.ac.jp

Deep Learning とは
多層 Neural network を使った機械学習法
Neural network の「逆襲」
(Shallow) Neural network Deep Neural Network

Deep learning事例:
画像特徴の教師なし学習
12層 DNN
パラメータ数 ∼1010
教師なし学習による自動特徴抽出
入力:YouTube の画像 108 枚
16 core PC x 103 台 x 3日間
「おばあさん細胞」生成か？
Le et al. ICML 2012
Preferred Stimuli in Higher level cellExamples of Training images

一般物体認識
IMAGENET Large ScaleVisual Recognition Challenge 2012
1000 カテゴリ x 約1000枚の訓練画像
Convolution Neural Network
Krizhevsky et al. NIPS 2012
SIFT + FVs: 0.26 test err.
CNN: 0.15 test err.

Text mining: Deep Generative Model
Bag of Words による Reuters ニュースのデータ
804,414 件の文書
Auto encoder による教師なし学習
Hinton & Salakhutdinov 2006
LegaLeading
Economic
Indicators
European Community
Monetary/Economic
Accounts/
Earnings
Interbank Markets
Government
Borrowings
Di
Ac
Energy Markets
Model$P(document)$
Bag$of$words$
Reuters$dataset:$804,414$$
newswire$stories:$unsupervis
Deep$Genera:ve$Model$
(Hinton & Salakhutdinov, S
2D-LSA Result
Deep Generative Model Result

Simple/Complex cell
(Hubel&WIesel 59)
Linear resp. func.
(Anzai+ 99)
201020001990198019701960
Perceptron
(Rosenblatt 57)
Neocognitron
(Fukushima 80)
Conv. net
(LeCun+ 89)
Deep learning
(Hinton+ 06)
“Linear Separable”
(Minski & Papert 68)
Sparse Coding
(Olshausen&Field 96)
Stochastic GD
(Amari 67)
Boltzmann Mach.
(HInton+85)
Back Prop.
(Rumelhart+ 86)
今ココ
第1期第2期
Neural network (NN) 歴史的背景

NN の基礎知識
基本素子の考え方
ネットワークアーキテクチャ
学習
コンボリューションネット

y1
y3
NN の基本要素
入力の線形和
非線形活性化関数
Logistic-Sigmoid
Rectiﬁed Linear
Hyperbolic Tangent, etc...
y1
y3
x1
x2
x3
y2
u
f (u)uj =
3X
i=1
wjixi + bj
yj = f
⇣
uj
⌘

NN の Architecture
ニューラルネットワークアーキテクチャ
階層型
相互結合型
Input
Output

NN の学習(Back Propagation)
パラメータ {wij}, {bj} を最適化
教師あり学習
Back Propagation (Ramelhart+ 86)
Input
Output
y1 y2
t1 t2 Teacher
コスト関数 H =
X
j
(tj yj)2
H =
X
j
tj ln yj
微係数を用いた学習(Gradient Decent)
wij = wij ⌘
@H
@wij
LeopardCat
0 1
wij
bj

Back Propagation の連鎖則
yk
tk
k
j
k = tk yk
wkj
@H
@wk j
= kyj
j = f0
(uj)
X
k
kwk j
i
wji
j
勾配の連鎖則
確率的降下法(Stochastic GD)
1サンプル毎は非効率全サンプルの平均勾配(Batch)は困難
mini Batch: 数個∼100個程度の平均勾配
準ニュートン法や，共益勾配法 (Le+11)

多階層における BackProp.
過学習問題
訓練誤差汎化誤差
勾配情報の拡散
識別器だけなら上位層で実現可能
全体のトレーニングは難しい
全結合型 NN で顕著
データに対してパラメータ数が過多
O(Mk Mk+1 )

Convolution NN (CNN)
(Neocognitron)
階層型ネットワーク
畳み込みによる局所特徴抽出と空間プーリング
Neocognitron(Fukushima80): 階層仮説の実装 (Hubel & Wiesel 59)
Back Prop. 導入 (LeCun89, Okada94)
S-Cell Feature Extraction
Us1 Uc1
C-Cell Tolerance to the distortion
Input
Recognition
U0 Us2 Uc2 Us3 Uc3 Us4 Uc4
It’ s “5”
S-Cell S-Cell
C-Cell
S-Cell
C-Cell
Feature IntegrationLocal
Feature
Global
Feature

CNN の動作原理
局所特徴抽出(畳み込み)＋変形に対する不変性(プーリング)
Preferred Feature
(Orientation): X
Input: x
Convlution Layer
Blurring
Preferred
Orientation
S-cell response
Input: x
Subsampling Layer
Convolutions
Subsampling
Convolutions
Subsampling
Preferred feature

CNN の動作原理(contd.)
局所特徴抽出(畳み込み)＋変形に対する不変性(プーリング)

CNN デモ
http://yann.lecun.com/exdb/lenet/index.html
Rotataion Scale
NoiseMultiple Input

CNN: Architecture の重要性
Network Architecture は学習と同様に重要(Jarret+09, Saxe+10)
アーキテクチャの違いによる性能評価(Caltech-101)
Figure 1. A example of feature extraction stage of the type FCSG−
Rabs − N − PA. An input image (or a feature map) is passed
through a non-linear filterbank, followed by rectification, local
contrast normalization and spatial pooling/sub-sampling.
layer with 4x4 down-sampling is denoted: P4×4
A .
Max-Pooling and Subsampling Layer - P : building lo-
Figure 4. Left: random stage-1 filters, and corresponding optimal inputs that maximize the response of each corresponding complex cell in
a FCSG −Rabs −N −PA architecture. The small asymmetry in the random filters is sufficient to make them orientation selective. Middle:
same for PSD filters. The optimal input patterns contain several periods since they maximize the output of a complete stage that contains
rectification, local normalization, and average pooling with down-sampling. Shifted versions of each pattern yield similar activations.
Right panel: subset of stage-2 filters obtained after PSD and supervised refinement on Caltech-101. Some structure is apparent.
Figure 4. Left: random stage-1 filters, and corresponding optimal inputs that maximize the response of each corresponding complex cell in
a FCSG −Rabs −N −PA architecture. The small asymmetry in the random filters is sufficient to make them orientation selective. Middle:
same for PSD filters. The optimal input patterns contain several periods since they maximize the output of a complete stage that contains
rectification, local normalization, and average pooling with down-sampling. Shifted versions of each pattern yield similar activations.
Right panel: subset of stage-2 filters obtained after PSD and supervised refinement on Caltech-101. Some structure is apparent.
Random
filter
Trained
filter
2 layer + abs
2 layer +mean
1 layer+abs
0.629 0.647
0.196 0.310
0.533 0.548
Random Predictive Sparse Decomp.
abs

視覚野(Ventral pathway)の性質
視覚野: 階層構造を持ち，階層ごとに異なる視覚課題の解決
初期視覚野: 狭い受容野，単純な特徴抽出
Simple Cell，Complex Cellの存在
高次視覚野: 広い受容野，中程度に複雑な特徴に選択的
V1
V2
V4
PITCIT
Ventral Pathway
AIT
TEO
TE
V1
V2
V3 VP
V4 MT VA/V4
PIT
AIT/CIT 8 TF
LIP MST DPL VIP
7a
V3A
V1
V4
V2
IT
Small receptive field
Edge, Line segment
detector
Large receptive field
Face, Complex feature
detector
?
?

初期視覚野の性質
線分やエッジなどの成分に反応
Simple cell: 方位，位相に敏感
Complex cell: 位相には許容的
Complex cell: Simple cel のカスケード接続
Simple Cell
Phase Sensitive
Orientation Selective
Receptive Field
Input Stimulus
Fire Not FireNot Fire
Phase InsensitiveComplex Cell
Receptive Field
Input Stimulus
Fire Not FireFire
V1
V2
V4
PITCIT
Ventral Pathway
AIT
TEO
TE
V1
V4
V2
IT
Edge, Line segment
detector
detector
?
?
500
150
~
http://ohzawa-lab.bpe.es.osaka-u.ac.jp/resources/text/
KisokouKoukai2009/Ohzawa2009Koukai04.pdf

高次視覚野の性質
中程度に複雑な特徴に反応
顔細胞の存在
巨大受容野
時空間的な変化に許容的
V1
V2
V4
PITCIT
Ventral Pathway
AIT
TEO
TE
V1
V4
V2
IT
Edge, Line segment
detector
detector
?
?

CNN の視覚野的解釈
Hubel & Wiesel の階層仮設: Compl cell →Simple Cell のカスケード接続
V2 → IT の不明な領野は初期視覚野による構造的外挿
学習によるチューニング可能性
V1
V2
V4
PITCIT
Ventral Pathway
AIT
TEO
TE
V1
V4
V2
IT
Edge, Line segment
detector
detector
?
?
U0 Us1Uc1 Us2Uc2 Us3Uc3 Us4Uc4 Us5Uc5
41x41x1
41x41x8
41x41x8
41x41xK2
21x21xK2
21x21xK3
11x11xK3
11x11xK4
5x5xK4
5x5xK5
1x1xK5

Simple/Complex cell
(Hubel&WIesel 59)
Linear resp. func.
(Anzai+ 99)
201020001990198019701960
Perceptron
(Rosenblatt 57)
Neocognitron
(Fukushima 80)
Conv. net
(LeCun+ 89)
Deep learning
(Hinton+ 06)
“Linear Separable”
(Minski & Papert 68)
Sparse Coding
Stochastic GD
(Amari 67)
Boltzmann Mach.
(HInton+85)
Back Prop.
(Rumelhart+ 86)
今ココ
第1期第2期
NN 周辺領域の歴史的背景

Face detection
(Viola & Jones 01)
HOG
(Dalal&Triggs 05)
SURF
(Bay+ 06)
SIFT
(Lowe 99)
Conv. net
(LeCun+ 89)
Deep learning
(Hinton+ 06)
Sparse Coding
201020001990
今ココ
SVM
(Vapnik 95)
Boosting
(Schapire 90)
L1-recovery
(Candes+ 06)
Bayesian Method
Bayesian net
(Pearl 00)
Kernel Method

NN 界隈で起こったこと@90年台後半
アーキテクチャ設計の難しさ for Back Prop.
隠れ素子が少なければ表現がプア
隠れ素子が多ければ過学習
機械学習法の進展
SupportVectorMachine / Kernel 法
Boosting
Shallow network で十分じゃないの？的な風潮

Viola & Jones による顔検出
Haar Like Feature + Boosting (Viola & Jones01)
Haar Like Detectors
Training Samples
http://vimeo.com/12774628

SIFT による画像記述
Scale Invariant Feature Transform (Lowe99)
特徴点検出とヒストグラムにより特徴記述
回転・スケール変化に不変，照明変化に頑健
u
v
l
-
-
-
-
ガウシアン平滑化ガウシアン差分画像 DoG
D( u, v, l )
2
3
4
5
1
2
3
4
極値探索
SIFT 特徴点
（キーポイント）
原画像
I( u, v )

SIFT による画像記述
Scale Invariant Feature Transform (Lowe99)
特徴点検出とヒストグラムにより特徴記述
回転・スケール変化に不変，照明変化に頑健
u
l
分画像 DoG
D( u, v, l )
極値探索
SIFT 特徴点
（キーポイント）
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
1 2 3 4 5 6 7 8
0
0.1
0.2
SIFT 特徴点
（キーポイント） SIFT 記述子
ヒストグラム化
特徴点周りの
勾配情報の算出

Bag of Features による画像認識
特徴量記述量を直接識別器へ (Bag ofVisual Words)(Csurka+04)
http://www.vision.cs.chubu.ac.jp/sift/PDF/sift_tutorial_ppt.pdf

HOG による画像記述
Histograms of Orientation Gradient (HOG) (Dalal&Triggs05)
エッジ成分の局所ヒストグラムによる表現
照明変化に頑健，大まかな領域の記述特徴
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
セル
ブロック
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
勾配画像 m(u, v)
セル分割
原画像 I(u,v) HOG 特徴 Vi
ブロック内の原画像
ブロック内の勾配強度画像
SVM などの
識別器

画像認識問題の NN 的解釈
画像の特性（エッジ等）に基づいた特徴量構築＋機械学習
Shallow Network model?
Input
Output
LeopardCat
Feature Detector(Haar, SIFT, HOG...)
Machine Learning (SVM, Boosting...)

部分特徴から組み合わせ特徴へ
Bag of Words からの脱却
部分特徴の組み合わせ特徴量へ (Felzenswalb+10, Divvala+12)
p$Models$
Hierarchical$Bayes$
CategorySbased$Hierarchy$
Marr$and$Nishihara$(1978)$
Deep$Nets$
PartSbased$Hierarchy$
(Marr&Nishihara78)
(Felzenswalb+10)

特徴抽出機構の設計
どうやって（中程度に複雑な）特徴検出器を作るか？
“Token” (Marr82) 的な組み合わせ
Object parts:
ハンドメイドな特徴量はしんどい→機械学習による表現獲得
Contnuation Coner Junction Cross

Face detection
(Viola & Jones 01)
HOG
(Dalal&Triggs 05)
SURF
(Bay+ 06)
SIFT
(Lowe 99)
Conv. net
(LeCun+ 89)
Deep learning
(Hinton+ 06)
Sparse Coding
201020001990
今ココ
SVM
(Vapnik 95)
Boosting
(Schapire 90)
L1-recovery
(Candes+ 06)
Bayesian Method
Bayesian net
(Pearl 00)
Kernel Method
Sparse Model
Sparse Model

疎表現によるデータ記述
基底ベクトルによる線形和表現
なるべく多くの係数が 0 になることを要請
y =
MX
i
xidi
= x1 +x2 +x3 +...
y d1 d2 d3
なるべく0に
{di} を学習で決める

疎表現によるデータ記述
= x1 +x2 +x3 +...
y d1 d2 d3
なるべく0に
H =
X
p
yp
X
i
xp
i di
2
+
X
i
kxp
i k1
画像をなるべく
忠実に表現
なるべく多くの
係数を 0 に (LASSO)
画像パッチ {yp} から {di} と {xi
p} を取得可能か？

Sparse Coding による特徴抽出
自然画像の Sparse coding による表現 (Olshausen&Fields96)
初期視覚野の線形応答関数(Anzai+99), Gabor Waveletに類似
自然音源の Sparse coding による表現 (Terashima&Okada12)
和音の表現
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
Slide credit: Andrew Ng

Sparse Representation for
MNIST
60K train, 10K test
Dict.size 512
Linear SVM classiﬁcation
H =
X
p
yp
X
i
xp
i di
2
+
X
i
kxp
i k1
Eval. Param.
Slide credit: KaiYu
Input Feature Classiﬁer
Sparse Coding

MNIST
λ = 5×10-4
部分的な検出器
H =
X
p
yp
X
i
xp
i di
2
+
X
i
kxp
i k1
Eval. Param.
Slide credit: KaiYu

MNIST
λ = 5×10-2
部分的な数字検出器
H =
X
p
yp
X
i
xp
i di
2
+
X
i
kxp
i k1
Eval. Param.
Slide credit: KaiYu

MNIST
λ = 5×10-4
VQ 表現的
H =
X
p
yp
X
i
xp
i di
2
+
X
i
kxp
i k1
Eval. Param.
Slide credit: KaiYu

Sparse Auto Encoder
Predictive Sparse Decomposition(Ranzato+07)
xp
= f(Wyp
)yp
= Dxp
Sparse Representation {xp}
Input Patchs {yp}
L1-Constraint
min
D,W,x
X
p
kyp
Dxp
k2
+ kxp
f(Wyp
)k2
+
X
i
kxp
i k
Encoder
Decoder

Sparseness + Hierarchy?
Hiearachical Sparse Coding (Yu+11)
Deep Belief network (DBN), Deep Boltzman Machine(DBM) (Hinton & Salakhutdinov06)
Hiearchy Representation
Input Patchs {yp}
Level 2 Features
Level 1 Features
EncoderDecoder
EncoderDecoder
EncoderDecoder

Deep Belief network (DBN), Deep Boltzman Machine(DBM)
(Hinton & Salakhutdinov06)
Input Patchs {yp}
Level 2 Features
Level 1 Features
Encoder
Encoder
Encoder
Decoder を外せば
NN として動作

Deep Belief network (DBN), Deep Boltzman Machine(DBM)
(Hinton & Salakhutdinov06)
Input Patchs {yp}
Level 2 Features
Level 1 Features
Decoder を動作させて
最適特徴を導出
Decoder
Decoder
Decoder

Hierarchical CNN +
Sparse Coding
Sparse coding を用いた階層型識別器(Yu+11, Zeiler+11)
Sparse Coding
2nd Layer の基底
回転，並進に対応
Convolutions
Subsampling
Convolutions
Subsampling

まとめ
Deep Learning の発想は古くから存在
ネットワークのアーキテクチャと学習，双方が重要
Deep Learning は何故流行っているか？
Shallow network の性能飽和
Hand-maid Feature detector の難しさ
Sparse Modeling の導入による学習方式の確立
計算機性能の向上による可能性の供給
所謂ビッグデータの到来による需要

まとめ(contd.)
細かい部分は（多分まだ）職人芸を必要とする．
Sparseness の設定，データに応じた素子の個数設定
データからのλの推定 (佐々木DC研究員)
認識性能の向上→deep化→過学習怖い→CrossValidation
→更なる計算性能の要求
設計スキームの一般化 (libsvm 的な何か?)は多分必要
特徴学習・表現学習の期待
Semi-supervised learning
ラベル付データのコストが高い分野は割りとありそう
マルチデバイスの統合など，イメージフュージョン

参考にしたもの
岡谷先生のスライド http://www.vision.is.tohoku.ac.jp/jp/research/
LeCun の Website http://yann.lecun.com/
IEEE PAMI 特集号
http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6541932
CVPR 2012 Deep Learning チュートリアル
http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/
神経回路と情報処理(福島邦彦: 朝倉書店)
ICONIP 2007 Special Session for Neocognitron
Python と Theano を使った Deepnet 構築 http://deeplearning.net/

20130925.deeplearning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (16)

Similaire à 20130925.deeplearning

Similaire à 20130925.deeplearning (20)

Dernier

Dernier (8)

20130925.deeplearning