SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
PR-313
주성훈, Samsung SDS
2021. 4. 18.
https://arxiv.org/pdf/2003.00152.pdf
Training BatchNorm and Only BatchNorm:
On the Expressive Power of Random Features in CNNs
Jonathan Frankle1, David J. Schwab2,3, Ari S. Morcos3
1 MIT CSAIL
2 CUNY Graduate Center, ITS
3 Facebook AI Research
Published as a conference paper at ICLR 2021
1. Research Background
1. Research Background
Learning affine transformations of features
• Feature의 Batch, Layer, Instance, Group 단위의 평균과 분산을 이용해 feature를 normalization 한 후, 학습 가능한 scale, shift
parameter를 적용함
Yuxin Wu, Kaiming He, Group Normalization, CVPR, 2018
3 / 19
1. Research Background
Exploiting the expressive power of affine transformations
• Residual Adaptor modules를 설계해 하나의 task에 대해 학습된 모델을 다른 visual domain에 적용할 수 있도록 함
Multi-task learning
• Per-task BatchNorm (Mudrakarta et al., ICLR, 2019)
:하나의 task에 대해 학습된 network을 가지고 다른 task에 적용할때는 BatchNorm parameters만 학습하는 방법
Style transfer and Style generation
• Instance Normalization을 활용해, content feature를 다양한 style feature로 인코딩 하는 데에 적용
• StyleGAN (Karras et al., CVPR, 2019), (PR-131)
• Adaptive instance normalization (AdaIN) (Huang and Belongie, CVPR, 2017) (PR-186)
4 / 19
1. Research Background
Batch Normalization
• Scale and shift를 결정하는 trainable parameter인 γ와 β의 역할과 표현 능력에 대해 잘 알지 못함
• BatchNorm is nearly ubiquitous in deep convolutional neural networks (CNNs) for computer
vision
5 / 19
1. Research Background
Training only BatchNorm
• 이전의 시도들
• Rosenfeld & Tsotsos, CRV, 2019
Network의 다양한 부분을 freeze하면서 Batch normalization parameter(γ,β)만을 학습시킴
CIFAR-10 networks (DenseNet and an unspecified Wide ResNet)에서 61% and 30% 정확도
• Mudrakarta et al., ICLR, 2019
하나의 task에 대해 학습된 network을 가지고 다른 task에 적용할때는 BatchNorm parameters만 학습하는 방법 제안
• 연구의 필요성
• 어떻게 BatchNorm parameters만을 학습했을 때 좋은 성능이 나오는지에 대한 깊은 연구 필요.
• 소수의 parameter를 학습한 결과와 BatchNorm parameter training 결과를 비교해 BatchNorm parameter의 표현력 확인
• 다양한 범위의 network에서 BatchNorm parameter training가 가능한지 확인
6 / 19
Objective & Approach
• We aim to understand the role and expressive power of affine parameters used to transform
features in this way (BatchNorm).
• We investigate the performance achieved when training only these parameters in BatchNorm
and freezing all weights at their random initializations.
• Random initialization network의 대부분의 parameters를 유지하면서 높은 정확도를 보인 논문들이 있음. (Zhou et al. NIPS 2019, Zhang et al. ICML 2019)
1. Research Background
CONV
All params trainable
BatchNorm
ReLU
Input
Train only Batch normalization parameter(γ,β)
…
Output
CONV
BatchNorm
CONV
BatchNorm
ReLU
Input
…
Output
CONV
BatchNorm
freezing all weights
freezing all weights
freezing all weights
7 / 19
2. Methods
Architectures
• ResNet for CIFAR-10과 ResNet for ImageNet 을 기본 구조로 활용 (Random initialization)
• Depth와 Width 조절해가며 실험 (ResNet을 기본 구조로 선택한 이유)
• Depth 조절: Kaiming He et al., CVPR 2015 논문을 따라서 layer를 더 쌓음
• Width 조절: layer당 채널을 늘림
• Activation전에 BatchNorm을 배치 (Kaiming He et al., ECCV 2016)
• Parameter initialization: β to 0, sample γ uniformly between 0 and 1
2. Methods 9 / 19
3. Experimental Results
- Batch normalization parameter(γ,β)를 제외한 모든 weight freeze (random initialization)
: 전체 parameter의 0.64% (CIFAR-10 ResNet), 0.27% (ImageNet ResNet)
3. Experimental Results
BatchNorm parameter만 학습했을 때의 accuracy
In ResNet-110,
93.3% test accuracy
In ResNet-110,
69.5% test accuracy
5 runs
• Finding 1: Random feature를 rescaling, shifting 하는 parameter를 학습하는 것 만으로도 높은 CIFAR-10 accuracy를
달성
11 / 19
3. Experimental Results
모델 architecture에 따른 성능 – Depth & Width
48 %
73 %
Widening network
48 %
82 %
Deepening network
• 네트워크를 더 깊게 쌓을 때가 더 넓게 만드는 것보다 batchnorm parameter training의 효과가 높아짐
ResNet-434가 7% 정도 성능이 더 높음
Figure 3: The relationship between BatchNorm parameter count and accuracy
when scaling depth and width of CIFAR-10 ResNets.
• 네트워크의 Width와 Depth를 늘림에 따라 정확도가 높아짐
12 / 19
* 1000개의 classes가 있는 ImageNet의 경우, Output layer를 같이 학습시켜 1000개의 class간의 fine-
grained distinctions을 학습하는 것이 필요함
3. Experimental Results
32 %
17 %
57 %
32%
CIFAR-10 (10 class)의 경우,
차이가 적음
BatchNorm params가 중요
Parameter에 따른 CNN 성능 – ResNet for ImageNet
3 runs
5 runs
13 / 19
3. Experimental Results
Are affine parameters special?
Batch normalization parameter(γ,β)가 아닌 다른 임의의 2개의 parameter만을 학습 -> 성능이 낮아짐
• Batch normalization parameter(γ,β)가 다른 종류의 parameter들 보다 정확도에 더 큰 영향을 미치는 것을 확인
• Scaling parameter를 통해 전체 random features를 조정하는 것이 일부 parameter를 수정하는 것 보다 중요하다.
• Finding 2: γ 와 β로 인한 feature scaling and bias만으로도 상당한 표현력을 가질 수 있다.
14 / 19
3. Experimental Results
어떻게 대부분의 parameters를 freezing 하고도 좋은 성능이 나왔는지 – γ value 의 역할
• γ를 0에 가깝게 설정함으로써 네트워크의 1/4 ~ 1/3를 비활성화하는 방법을 배우는 것으로 보임
• 저자들은 γ가 0에 가까워지는 것이 exploding activations를 막기 위함이라고 생각함
• Finding 3: γ 와 β 가 per-feature sparsity를 부과함으로서 모델의 높은 정확도에 기여한다.
27% 33%
Network이 깊고 넓어질수록,
γ가 0에 가까워지는 경향이 있음
15 / 19
3. Experimental Results
• γ가 0에 가깝긴 한데 0은 아닌 것 같다
어떻게 대부분의 parameters를 freezing 하고도 좋은 성능이 나왔는지 – γ value 의 역할
16 / 19
3. Experimental Results
• Only training BatchNorm: 0에 가까운 γ 값, activations 비활성화
• All parameters are trainable: activation이 disabled된 비율은 낮았음. BatchNorm parameters가 여전히 역할을 한다.
• Finding 4: BatchNorm parameter γ가 activation을 조정하는 역할을 한다.
어떻게 대부분의 parameters를 freezing 하고도 좋은 성능이 나왔는지 – γ, β value 의 역할
Network이 깊고 넓어질수록,
γ가 0에 가까워지는 경향이 있음
activations 비활성화
17 / 19
4. Conclusion
Thank you.
4. Conclusions
2. Random initialized feature로 구성된 network를 훈련시키는 새로운 방법
• Output layer만 학습하는 것 보다 Batchnorm parameters만 학습하는게 더 성능이 좋았다.
• Training cost 를 줄이는 방법은 아니지만, inference를 위해 random seed와 BatchNorm parameter만 저장해도 괜찮겠
다.
• 이와 관련해 특정 task의 성능을 높일 수 있는 random initialization 방법 자체를 연구할 필요도 있겠다.
1. Affine parameters가 learned features와 관계없이 그 자체로 상당한 표현력을 갖는다는 결론을 내린다.
• Finding 1: Random feature를 rescaling, shifting 하는 parameter를 학습하는 것 만으로도 높은 CIFAR-10 accuracy를 달
성
• Finding 2: γ 와 β로 인한 feature scaling and bias만으로도 상당한 표현력을 가질 수 있다.
• Finding 3: γ 와 β 가 per-feature sparsity를 부과함으로서 모델의 높은 정확도에 기여한다.
• Finding 4: BatchNorm parameter γ가 activation을 조정하는 역할을 한다.
3. Limitations and future work
• 다른 network (Inception net., Transformer) 에서의 연구
• Training only batchnorm 조건일때의 Hyperparameter tuning 연구
• Batchnorm이 없는 경우에도 (e.g. WeightNorm (Salimans & Kingma), FixUp initialization (Zhang et
al.)) affine parameter만을 학습시키는 방법을 적용할 수 있는지
19 / 19

Contenu connexe

Tendances

"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper ReviewLEE HOSEONG
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper ReviewLEE HOSEONG
 
Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Korea, Sejong University.
 
Encoding in Style: a Style Encoder for Image-to-Image Translation
Encoding in Style: a Style Encoder for Image-to-Image TranslationEncoding in Style: a Style Encoder for Image-to-Image Translation
Encoding in Style: a Style Encoder for Image-to-Image Translationtaeseon ryu
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper ReviewLEE HOSEONG
 
네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLDKim Junghoon
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Jinwon Lee
 
스마트폰 위의 딥러닝
스마트폰 위의 딥러닝스마트폰 위의 딥러닝
스마트폰 위의 딥러닝NAVER Engineering
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...LEE HOSEONG
 
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...changedaeoh
 
"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper ReviewLEE HOSEONG
 
(Nlp)fine tuning 대회_참여기
(Nlp)fine tuning 대회_참여기(Nlp)fine tuning 대회_참여기
(Nlp)fine tuning 대회_참여기OverDeep
 
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...MYEONGGYU LEE
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Networkagdatalab
 
AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetJungwon Kim
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural networkDongyi Kim
 
Enliple BERT-Small을 이용한 KorQuAD 모델
Enliple BERT-Small을 이용한 KorQuAD 모델Enliple BERT-Small을 이용한 KorQuAD 모델
Enliple BERT-Small을 이용한 KorQuAD 모델KwangHyeonPark
 
Bert3q KorQuAD Finetuning NLP Challenge
Bert3q KorQuAD Finetuning NLP ChallengeBert3q KorQuAD Finetuning NLP Challenge
Bert3q KorQuAD Finetuning NLP ChallengeOverDeep
 

Tendances (20)

"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review
 
Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks
 
Encoding in Style: a Style Encoder for Image-to-Image Translation
Encoding in Style: a Style Encoder for Image-to-Image TranslationEncoding in Style: a Style Encoder for Image-to-Image Translation
Encoding in Style: a Style Encoder for Image-to-Image Translation
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review
 
네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 
스마트폰 위의 딥러닝
스마트폰 위의 딥러닝스마트폰 위의 딥러닝
스마트폰 위의 딥러닝
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...
 
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
 
"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review
 
(Nlp)fine tuning 대회_참여기
(Nlp)fine tuning 대회_참여기(Nlp)fine tuning 대회_참여기
(Nlp)fine tuning 대회_참여기
 
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
 
AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, Resnet
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural network
 
Enliple BERT-Small을 이용한 KorQuAD 모델
Enliple BERT-Small을 이용한 KorQuAD 모델Enliple BERT-Small을 이용한 KorQuAD 모델
Enliple BERT-Small을 이용한 KorQuAD 모델
 
Bert3q KorQuAD Finetuning NLP Challenge
Bert3q KorQuAD Finetuning NLP ChallengeBert3q KorQuAD Finetuning NLP Challenge
Bert3q KorQuAD Finetuning NLP Challenge
 
CNN
CNNCNN
CNN
 

Similaire à PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작DACON AI 데이콘
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...gohyunwoong
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...Sunghoon Joo
 
Transfer learning usage
Transfer learning usageTransfer learning usage
Transfer learning usageTae Young Lee
 
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...태엽 김
 
AnoGAN을 이용한 철강 소재 결함 검출 AI
AnoGAN을 이용한 철강 소재 결함 검출 AIAnoGAN을 이용한 철강 소재 결함 검출 AI
AnoGAN을 이용한 철강 소재 결함 검출 AIHYEJINLIM10
 
Infra as a model service
Infra as a model serviceInfra as a model service
Infra as a model serviceTae Young Lee
 
180212 normalization hyu_dake
180212 normalization hyu_dake180212 normalization hyu_dake
180212 normalization hyu_dakeDongGyun Hong
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning성재 최
 
천체 유형 분류 대회 1위 수상자 코드 설명
천체 유형 분류 대회 1위 수상자 코드 설명천체 유형 분류 대회 1위 수상자 코드 설명
천체 유형 분류 대회 1위 수상자 코드 설명DACON AI 데이콘
 
생체 광학 데이터 분석 AI 경진대회 9위 수상작
생체 광학 데이터 분석 AI 경진대회 9위 수상작생체 광학 데이터 분석 AI 경진대회 9위 수상작
생체 광학 데이터 분석 AI 경진대회 9위 수상작DACON AI 데이콘
 
Exploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMsExploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMsTae Young Lee
 
Basics of deep learning_imcloud
Basics of deep learning_imcloudBasics of deep learning_imcloud
Basics of deep learning_imcloudimcloud
 
생체 광학 데이터 분석 AI 경진대회 7위 수상작
생체 광학 데이터 분석 AI 경진대회 7위 수상작생체 광학 데이터 분석 AI 경진대회 7위 수상작
생체 광학 데이터 분석 AI 경진대회 7위 수상작DACON AI 데이콘
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiionSubin An
 
원자력발전소 상태 판단 대회 1위 수상자 코드 설명
원자력발전소 상태 판단 대회 1위 수상자 코드 설명원자력발전소 상태 판단 대회 1위 수상자 코드 설명
원자력발전소 상태 판단 대회 1위 수상자 코드 설명DACON AI 데이콘
 
Pycon korea 2018 kaggle tutorial(kaggle break)
Pycon korea 2018 kaggle tutorial(kaggle break)Pycon korea 2018 kaggle tutorial(kaggle break)
Pycon korea 2018 kaggle tutorial(kaggle break)Yeonmin Kim
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
 

Similaire à PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs (20)

위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
Transfer learning usage
Transfer learning usageTransfer learning usage
Transfer learning usage
 
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
 
AnoGAN을 이용한 철강 소재 결함 검출 AI
AnoGAN을 이용한 철강 소재 결함 검출 AIAnoGAN을 이용한 철강 소재 결함 검출 AI
AnoGAN을 이용한 철강 소재 결함 검출 AI
 
Infra as a model service
Infra as a model serviceInfra as a model service
Infra as a model service
 
180212 normalization hyu_dake
180212 normalization hyu_dake180212 normalization hyu_dake
180212 normalization hyu_dake
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
 
천체 유형 분류 대회 1위 수상자 코드 설명
천체 유형 분류 대회 1위 수상자 코드 설명천체 유형 분류 대회 1위 수상자 코드 설명
천체 유형 분류 대회 1위 수상자 코드 설명
 
생체 광학 데이터 분석 AI 경진대회 9위 수상작
생체 광학 데이터 분석 AI 경진대회 9위 수상작생체 광학 데이터 분석 AI 경진대회 9위 수상작
생체 광학 데이터 분석 AI 경진대회 9위 수상작
 
Exploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMsExploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMs
 
Basics of deep learning_imcloud
Basics of deep learning_imcloudBasics of deep learning_imcloud
Basics of deep learning_imcloud
 
생체 광학 데이터 분석 AI 경진대회 7위 수상작
생체 광학 데이터 분석 AI 경진대회 7위 수상작생체 광학 데이터 분석 AI 경진대회 7위 수상작
생체 광학 데이터 분석 AI 경진대회 7위 수상작
 
NN and PDF
NN and PDFNN and PDF
NN and PDF
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiion
 
원자력발전소 상태 판단 대회 1위 수상자 코드 설명
원자력발전소 상태 판단 대회 1위 수상자 코드 설명원자력발전소 상태 판단 대회 1위 수상자 코드 설명
원자력발전소 상태 판단 대회 1위 수상자 코드 설명
 
S.M.Lee, Invited Talk on "Machine Learning-based Anomaly Detection"
S.M.Lee, Invited Talk on "Machine Learning-based Anomaly Detection"S.M.Lee, Invited Talk on "Machine Learning-based Anomaly Detection"
S.M.Lee, Invited Talk on "Machine Learning-based Anomaly Detection"
 
Pycon korea 2018 kaggle tutorial(kaggle break)
Pycon korea 2018 kaggle tutorial(kaggle break)Pycon korea 2018 kaggle tutorial(kaggle break)
Pycon korea 2018 kaggle tutorial(kaggle break)
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
 

Plus de Sunghoon Joo

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterSunghoon Joo
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...Sunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionSunghoon Joo
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningSunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingSunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesSunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...Sunghoon Joo
 

Plus de Sunghoon Joo (16)

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked Autoencoders
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

  • 1. PR-313 주성훈, Samsung SDS 2021. 4. 18. https://arxiv.org/pdf/2003.00152.pdf Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs Jonathan Frankle1, David J. Schwab2,3, Ari S. Morcos3 1 MIT CSAIL 2 CUNY Graduate Center, ITS 3 Facebook AI Research Published as a conference paper at ICLR 2021
  • 3. 1. Research Background Learning affine transformations of features • Feature의 Batch, Layer, Instance, Group 단위의 평균과 분산을 이용해 feature를 normalization 한 후, 학습 가능한 scale, shift parameter를 적용함 Yuxin Wu, Kaiming He, Group Normalization, CVPR, 2018 3 / 19
  • 4. 1. Research Background Exploiting the expressive power of affine transformations • Residual Adaptor modules를 설계해 하나의 task에 대해 학습된 모델을 다른 visual domain에 적용할 수 있도록 함 Multi-task learning • Per-task BatchNorm (Mudrakarta et al., ICLR, 2019) :하나의 task에 대해 학습된 network을 가지고 다른 task에 적용할때는 BatchNorm parameters만 학습하는 방법 Style transfer and Style generation • Instance Normalization을 활용해, content feature를 다양한 style feature로 인코딩 하는 데에 적용 • StyleGAN (Karras et al., CVPR, 2019), (PR-131) • Adaptive instance normalization (AdaIN) (Huang and Belongie, CVPR, 2017) (PR-186) 4 / 19
  • 5. 1. Research Background Batch Normalization • Scale and shift를 결정하는 trainable parameter인 γ와 β의 역할과 표현 능력에 대해 잘 알지 못함 • BatchNorm is nearly ubiquitous in deep convolutional neural networks (CNNs) for computer vision 5 / 19
  • 6. 1. Research Background Training only BatchNorm • 이전의 시도들 • Rosenfeld & Tsotsos, CRV, 2019 Network의 다양한 부분을 freeze하면서 Batch normalization parameter(γ,β)만을 학습시킴 CIFAR-10 networks (DenseNet and an unspecified Wide ResNet)에서 61% and 30% 정확도 • Mudrakarta et al., ICLR, 2019 하나의 task에 대해 학습된 network을 가지고 다른 task에 적용할때는 BatchNorm parameters만 학습하는 방법 제안 • 연구의 필요성 • 어떻게 BatchNorm parameters만을 학습했을 때 좋은 성능이 나오는지에 대한 깊은 연구 필요. • 소수의 parameter를 학습한 결과와 BatchNorm parameter training 결과를 비교해 BatchNorm parameter의 표현력 확인 • 다양한 범위의 network에서 BatchNorm parameter training가 가능한지 확인 6 / 19
  • 7. Objective & Approach • We aim to understand the role and expressive power of affine parameters used to transform features in this way (BatchNorm). • We investigate the performance achieved when training only these parameters in BatchNorm and freezing all weights at their random initializations. • Random initialization network의 대부분의 parameters를 유지하면서 높은 정확도를 보인 논문들이 있음. (Zhou et al. NIPS 2019, Zhang et al. ICML 2019) 1. Research Background CONV All params trainable BatchNorm ReLU Input Train only Batch normalization parameter(γ,β) … Output CONV BatchNorm CONV BatchNorm ReLU Input … Output CONV BatchNorm freezing all weights freezing all weights freezing all weights 7 / 19
  • 9. Architectures • ResNet for CIFAR-10과 ResNet for ImageNet 을 기본 구조로 활용 (Random initialization) • Depth와 Width 조절해가며 실험 (ResNet을 기본 구조로 선택한 이유) • Depth 조절: Kaiming He et al., CVPR 2015 논문을 따라서 layer를 더 쌓음 • Width 조절: layer당 채널을 늘림 • Activation전에 BatchNorm을 배치 (Kaiming He et al., ECCV 2016) • Parameter initialization: β to 0, sample γ uniformly between 0 and 1 2. Methods 9 / 19
  • 11. - Batch normalization parameter(γ,β)를 제외한 모든 weight freeze (random initialization) : 전체 parameter의 0.64% (CIFAR-10 ResNet), 0.27% (ImageNet ResNet) 3. Experimental Results BatchNorm parameter만 학습했을 때의 accuracy In ResNet-110, 93.3% test accuracy In ResNet-110, 69.5% test accuracy 5 runs • Finding 1: Random feature를 rescaling, shifting 하는 parameter를 학습하는 것 만으로도 높은 CIFAR-10 accuracy를 달성 11 / 19
  • 12. 3. Experimental Results 모델 architecture에 따른 성능 – Depth & Width 48 % 73 % Widening network 48 % 82 % Deepening network • 네트워크를 더 깊게 쌓을 때가 더 넓게 만드는 것보다 batchnorm parameter training의 효과가 높아짐 ResNet-434가 7% 정도 성능이 더 높음 Figure 3: The relationship between BatchNorm parameter count and accuracy when scaling depth and width of CIFAR-10 ResNets. • 네트워크의 Width와 Depth를 늘림에 따라 정확도가 높아짐 12 / 19
  • 13. * 1000개의 classes가 있는 ImageNet의 경우, Output layer를 같이 학습시켜 1000개의 class간의 fine- grained distinctions을 학습하는 것이 필요함 3. Experimental Results 32 % 17 % 57 % 32% CIFAR-10 (10 class)의 경우, 차이가 적음 BatchNorm params가 중요 Parameter에 따른 CNN 성능 – ResNet for ImageNet 3 runs 5 runs 13 / 19
  • 14. 3. Experimental Results Are affine parameters special? Batch normalization parameter(γ,β)가 아닌 다른 임의의 2개의 parameter만을 학습 -> 성능이 낮아짐 • Batch normalization parameter(γ,β)가 다른 종류의 parameter들 보다 정확도에 더 큰 영향을 미치는 것을 확인 • Scaling parameter를 통해 전체 random features를 조정하는 것이 일부 parameter를 수정하는 것 보다 중요하다. • Finding 2: γ 와 β로 인한 feature scaling and bias만으로도 상당한 표현력을 가질 수 있다. 14 / 19
  • 15. 3. Experimental Results 어떻게 대부분의 parameters를 freezing 하고도 좋은 성능이 나왔는지 – γ value 의 역할 • γ를 0에 가깝게 설정함으로써 네트워크의 1/4 ~ 1/3를 비활성화하는 방법을 배우는 것으로 보임 • 저자들은 γ가 0에 가까워지는 것이 exploding activations를 막기 위함이라고 생각함 • Finding 3: γ 와 β 가 per-feature sparsity를 부과함으로서 모델의 높은 정확도에 기여한다. 27% 33% Network이 깊고 넓어질수록, γ가 0에 가까워지는 경향이 있음 15 / 19
  • 16. 3. Experimental Results • γ가 0에 가깝긴 한데 0은 아닌 것 같다 어떻게 대부분의 parameters를 freezing 하고도 좋은 성능이 나왔는지 – γ value 의 역할 16 / 19
  • 17. 3. Experimental Results • Only training BatchNorm: 0에 가까운 γ 값, activations 비활성화 • All parameters are trainable: activation이 disabled된 비율은 낮았음. BatchNorm parameters가 여전히 역할을 한다. • Finding 4: BatchNorm parameter γ가 activation을 조정하는 역할을 한다. 어떻게 대부분의 parameters를 freezing 하고도 좋은 성능이 나왔는지 – γ, β value 의 역할 Network이 깊고 넓어질수록, γ가 0에 가까워지는 경향이 있음 activations 비활성화 17 / 19
  • 19. Thank you. 4. Conclusions 2. Random initialized feature로 구성된 network를 훈련시키는 새로운 방법 • Output layer만 학습하는 것 보다 Batchnorm parameters만 학습하는게 더 성능이 좋았다. • Training cost 를 줄이는 방법은 아니지만, inference를 위해 random seed와 BatchNorm parameter만 저장해도 괜찮겠 다. • 이와 관련해 특정 task의 성능을 높일 수 있는 random initialization 방법 자체를 연구할 필요도 있겠다. 1. Affine parameters가 learned features와 관계없이 그 자체로 상당한 표현력을 갖는다는 결론을 내린다. • Finding 1: Random feature를 rescaling, shifting 하는 parameter를 학습하는 것 만으로도 높은 CIFAR-10 accuracy를 달 성 • Finding 2: γ 와 β로 인한 feature scaling and bias만으로도 상당한 표현력을 가질 수 있다. • Finding 3: γ 와 β 가 per-feature sparsity를 부과함으로서 모델의 높은 정확도에 기여한다. • Finding 4: BatchNorm parameter γ가 activation을 조정하는 역할을 한다. 3. Limitations and future work • 다른 network (Inception net., Transformer) 에서의 연구 • Training only batchnorm 조건일때의 Hyperparameter tuning 연구 • Batchnorm이 없는 경우에도 (e.g. WeightNorm (Salimans & Kingma), FixUp initialization (Zhang et al.)) affine parameter만을 학습시키는 방법을 적용할 수 있는지 19 / 19