SlideShare a Scribd company logo
1 of 20
Download to read offline
Class-Balanced Loss Based on Effective Number of Samples
PR-203
Yin Cui1,2*, Menglin Jia1, Tsung-Yi Lin3, Yang Song4, and Serge Belongie1,4
1 Cornell University
2 Cornell University
3 Google Brain
4 Alphabet Inc.
CVPR 2019
주성훈, 삼성SDS AI선행연구Lab.
2019. 10. 27.
1. Research Background
1. Research Background
Introduction
3/20
• What & Why
• CNN기반의 Visual recognition task에서,
long-tailed data distribution 문제를 해결하는 것이 중요해지고 있다
• 대부분의 real-world dataset은 skewed distribution 을 가지고 있음
• Long-tailed data 를 학습한 CNN 모델이 샘플 수가 적은 클래스에 대해
상대적으로 안좋은 성능을 나타냄
Figure 1. Two classes, one from the head and one from the tail
of a long-tailed dataset (iNaturalist 2017 in this example), have
drastically different number of samples.
1. Research Background
Previous works
4/20
• Long-tailed data 에서의 성능을 높이려는 시도들
https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets
Chawla, N. V. et al. (2011). SMOTE: Synthetic Minority Over-sampling Technique.
He, H. et al. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning.
모델의 Overfitting 유발
성능 저하
• Re-sampling
Under-sampling Over-sampling
Synthesizing elements for the minority class (SMOTE, ADASYN)
생성된 샘플의 노이즈로 인한
모델 성능 저하
1. Research Background
Previous works
5/20
• Long-tailed data 에서의 성능을 높이려는 시도들
[1] Golbon-Haghighi et al. (2016). Learning to model the tail
[2] Mahajan, D. et al. (2018). Exploring the Limits of Weakly Supervised Pretraining.
• Cost-sensitive re-weighting
작은 클래스에 대한 예측결과가 더 큰 영향을 미치도록 loss function을 바꿈
ℒ = − ෍
𝑖=1
𝑛
𝛼𝑖 𝑦𝑖 log(𝑆 𝑓𝜃(𝑥𝑖) )
Weighting factor 𝛼𝑖 :
Inverse class frequency 사용 [1], square root of class frequency (smoothed version) [2]
How can we design a better class-balanced loss that is applicable to a
diverse array of datasets?
1. Research Background
Objective
6/20
• Class별로 일정 샘플 수를 넘어서면, 새로 추가된 datapoint 가 갖는 이점이 감소한다.
• Data overlap을 측정하는 새로운 이론적 프레임워크 (theoretical framework)를 제시해서,
long-tailed dataset을 CNN에 학습시킬 때의 성능 저하를 새로운 loss function을 디자인함으로서 해결하는
것이 목적이다.
Key contribution
• The effective number of samples 이라는 새로운 이론적 프레임워크 제시
• Long-tailed dataset에 대한 CNN의 분류 성능 향상
2. Methods
2. Methods
Effective number of samples
S에서 더 많은 데이터를 샘플링 할수록,
샘플링한 데이터들은 S를 더 잘 나타낸다.
크기가 N인 feature space S Unit volumn이 1인 sampled data
Newly sampled data가 Previously sampled data와 ‘겹친다 (p)’ , ’겹치지 않는다 (1-p)’ 의 두 경우만 고려했음
8/20
2. Methods
Effective number of samples
(𝐸 𝑛−1)
Hyperparameter 𝛽 ∈ 0,1 : 𝑛이 증가함에 따라 𝐸 𝑛이 얼마나 빨리 증가할지 결정
9/20
2. Methods
Class-Balanced Loss의 정의
10/20
• 각 class마다의 weighting factor 𝒂𝒊 를 𝒂𝒊 ∝ 𝟏/𝑬 𝒏 𝒊
로 정의함
Figure 3. Visualization of the proposed class-balanced term. Both axes are in log scale. For a long-tailed dataset
where major classes have significantly more samples than minor classes, setting beta properly re-balances the
relative loss across classes and reduces the drastic imbalance of re-weighing by inverse class frequency.
2. Methods
Class-Balanced Loss의 정의
11/20
• The class-balanced softmax cross-entropy loss :
• The class-balanced focal loss :
• The class-balanced sigmoid cross-entropy loss :
sigmoid
Z=predicted output, y=label
2. Methods
Dataset – Long-Tailed CIFAR
12/20
Imbalance factor
𝑛 = 𝑛𝑖 𝜇 𝑖
𝑛𝑖= original number of training images
𝜇𝑖 ∈ (0,1)
• Class-Balanced loss 사용시 Long-tailed CIFAR 에 대한 성능 향상이 나타남.
• 대부분의 경우 Sigmoid cross-entropy (SGM) 와 focal loss가 Softmax cross-entropy(SM)보다 좋은 성능을 보임
• 다음의 함수에 따라 training sample의 수를 줄임
2. Methods
Implementation
13/20
• Dataset
• Model
• ResNet-32 (for CIFAR-10, CIFAR-100)
• ResNet-50, ResNet-101, ResNet-152 (for iNaturalist, Imagenet)
• Hyperparameters
• Models on CIFAR were trained with batch size of 128 on a single NVIDIA Titan X GPU for 200 epochs.
Models on iNaturalist and ILSVRC 2012 data were trained with batch size of 1024 on a single cloud TPU for
90 epochs.
• Optimizer: SGD with momentum (learning rate decay was used)
• Initial learning rate: 0.1 (CIFAR), 0.2 (Imagenet), 0.4 (iNaturalist)
3. Experimental Results
3. Experimental Results
Visual recognition on Long-Tailed CIFAR
15/20
β ∈ 0.9, 0.99, 0.999, 0.9999 , and 𝛾 ∈ 0.5, 1.0, 2.0 for focal loss.
• Class-Balanced loss 사용시 Long-tailed CIFAR 에 대한 성능 향상이 나타남.
• 대부분의 경우 Sigmoid cross-entropy (SGM) 와 focal loss가 Softmax cross-entropy(SM)보다 좋은 성능을 보임
3. Experimental Results
Visual recognition on Long-Tailed CIFAR
16/20
• Class의 수가 많은 long-tailed dataset을 사용했을 때, 이 논문에서 설계한 class balanced loss가 기존 방법에 비해 우수함
3. Experimental Results
Visual Recognition on Large-Scale Datasets
17/20
• CB Focal loss를 활용했을 때 모든 large-scale dataset 에 대해 좋은 성능을 나타냈다.
3. Experimental Results
Visual Recognition on Large-Scale Datasets
18/20
• CB Focal loss를 활용했을 때 60 epochs 이후부터 성능에서의 장점이 나타났다.
4. Conclusion
4. Conclusions 20/20
Thank you.
• 이 논문은 Trainind data의 long-tail distribution 문제를 해결하기 위한
theoretical framework를 제시했다.
• Data Distribution를 더 잘 알면 effective number 를 더 잘 추정할 수 있을
것이기 때문에, 향후 data distribution에 대한 합리적인 가정을 추가하는
방식으로 이 framework 의 성능을 높일 계획이다.
• 저자들의 Key idea는 각 class에 해당하는 data가 feature space에서 겹친
다는 것을 고려해 Effective number of sample을 제안하고, 이를 바탕으로
class-balanced loss를 제안한 것이다.
• 저자들은 제안된 class balanced loss의 장점을 iNaturalist, ImageNet,
CIFAR 등 다양한 규모의 dataset 에 대한 실험을 통해 검증했다.

More Related Content

What's hot

Humpback whale identification challenge反省会
Humpback whale identification challenge反省会Humpback whale identification challenge反省会
Humpback whale identification challenge反省会Yusuke Uchida
 
[DL輪読会]Relational inductive biases, deep learning, and graph networks
[DL輪読会]Relational inductive biases, deep learning, and graph networks[DL輪読会]Relational inductive biases, deep learning, and graph networks
[DL輪読会]Relational inductive biases, deep learning, and graph networksDeep Learning JP
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법 홍배 김
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization TrickMasahiro Suzuki
 
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術SSII
 
[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial NetworksJaeJun Yoo
 
semantic segmentation サーベイ
semantic segmentation サーベイsemantic segmentation サーベイ
semantic segmentation サーベイyohei okawa
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from XailientEdge AI and Vision Alliance
 
[DL輪読会]Network Deconvolution
[DL輪読会]Network Deconvolution[DL輪読会]Network Deconvolution
[DL輪読会]Network DeconvolutionDeep Learning JP
 
[DL輪読会]Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval ...
[DL輪読会]Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval ...[DL輪読会]Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval ...
[DL輪読会]Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval ...Deep Learning JP
 
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習SSII
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選Yusuke Uchida
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...taeseon ryu
 
CNNの構造最適化手法について
CNNの構造最適化手法についてCNNの構造最適化手法について
CNNの構造最適化手法についてMasanoriSuganuma
 
MLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMasakazu Shinoda
 
CNN-RNN: a large-scale hierarchical image classification framework
CNN-RNN: a large-scale hierarchical image classification frameworkCNN-RNN: a large-scale hierarchical image classification framework
CNN-RNN: a large-scale hierarchical image classification frameworkharmonylab
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for PredictionDeep Learning JP
 

What's hot (20)

Humpback whale identification challenge反省会
Humpback whale identification challenge反省会Humpback whale identification challenge反省会
Humpback whale identification challenge反省会
 
[DL輪読会]Relational inductive biases, deep learning, and graph networks
[DL輪読会]Relational inductive biases, deep learning, and graph networks[DL輪読会]Relational inductive biases, deep learning, and graph networks
[DL輪読会]Relational inductive biases, deep learning, and graph networks
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
 
Prml6
Prml6Prml6
Prml6
 
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
 
[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks
 
semantic segmentation サーベイ
semantic segmentation サーベイsemantic segmentation サーベイ
semantic segmentation サーベイ
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
 
[DL輪読会]Network Deconvolution
[DL輪読会]Network Deconvolution[DL輪読会]Network Deconvolution
[DL輪読会]Network Deconvolution
 
[DL輪読会]Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval ...
[DL輪読会]Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval ...[DL輪読会]Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval ...
[DL輪読会]Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval ...
 
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選
 
正準相関分析
正準相関分析正準相関分析
正準相関分析
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
 
CNNの構造最適化手法について
CNNの構造最適化手法についてCNNの構造最適化手法について
CNNの構造最適化手法について
 
MLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMLデザインパターン入門_Embeddings
MLデザインパターン入門_Embeddings
 
CNN-RNN: a large-scale hierarchical image classification framework
CNN-RNN: a large-scale hierarchical image classification frameworkCNN-RNN: a large-scale hierarchical image classification framework
CNN-RNN: a large-scale hierarchical image classification framework
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
 

Similar to PR-203: Class-Balanced Loss Based on Effective Number of Samples

PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchSunghoon Joo
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰taeseon ryu
 
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks pko89403
 
네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLDKim Junghoon
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDong Heon Cho
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationLEE HOSEONG
 
[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement LearningKiho Suh
 
Learning by association
Learning by associationLearning by association
Learning by association홍배 김
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...gohyunwoong
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiionSubin An
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Networkagdatalab
 
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQNCurt Park
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Jinwon Lee
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsseungwoo kim
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
 
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작DACON AI 데이콘
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...Sunghoon Joo
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...LEE HOSEONG
 

Similar to PR-203: Class-Balanced Loss Based on Effective Number of Samples (20)

PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
 
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
 
네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other Models
 
S.M.Lee, Invited Talk on "Machine Learning-based Anomaly Detection"
S.M.Lee, Invited Talk on "Machine Learning-based Anomaly Detection"S.M.Lee, Invited Talk on "Machine Learning-based Anomaly Detection"
S.M.Lee, Invited Talk on "Machine Learning-based Anomaly Detection"
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning
 
Learning by association
Learning by associationLearning by association
Learning by association
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiion
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
 
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQN
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
 
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 

More from Sunghoon Joo

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterSunghoon Joo
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...Sunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionSunghoon Joo
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningSunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...Sunghoon Joo
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingSunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesSunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...Sunghoon Joo
 

More from Sunghoon Joo (17)

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked Autoencoders
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

PR-203: Class-Balanced Loss Based on Effective Number of Samples

  • 1. Class-Balanced Loss Based on Effective Number of Samples PR-203 Yin Cui1,2*, Menglin Jia1, Tsung-Yi Lin3, Yang Song4, and Serge Belongie1,4 1 Cornell University 2 Cornell University 3 Google Brain 4 Alphabet Inc. CVPR 2019 주성훈, 삼성SDS AI선행연구Lab. 2019. 10. 27.
  • 3. 1. Research Background Introduction 3/20 • What & Why • CNN기반의 Visual recognition task에서, long-tailed data distribution 문제를 해결하는 것이 중요해지고 있다 • 대부분의 real-world dataset은 skewed distribution 을 가지고 있음 • Long-tailed data 를 학습한 CNN 모델이 샘플 수가 적은 클래스에 대해 상대적으로 안좋은 성능을 나타냄 Figure 1. Two classes, one from the head and one from the tail of a long-tailed dataset (iNaturalist 2017 in this example), have drastically different number of samples.
  • 4. 1. Research Background Previous works 4/20 • Long-tailed data 에서의 성능을 높이려는 시도들 https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets Chawla, N. V. et al. (2011). SMOTE: Synthetic Minority Over-sampling Technique. He, H. et al. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 모델의 Overfitting 유발 성능 저하 • Re-sampling Under-sampling Over-sampling Synthesizing elements for the minority class (SMOTE, ADASYN) 생성된 샘플의 노이즈로 인한 모델 성능 저하
  • 5. 1. Research Background Previous works 5/20 • Long-tailed data 에서의 성능을 높이려는 시도들 [1] Golbon-Haghighi et al. (2016). Learning to model the tail [2] Mahajan, D. et al. (2018). Exploring the Limits of Weakly Supervised Pretraining. • Cost-sensitive re-weighting 작은 클래스에 대한 예측결과가 더 큰 영향을 미치도록 loss function을 바꿈 ℒ = − ෍ 𝑖=1 𝑛 𝛼𝑖 𝑦𝑖 log(𝑆 𝑓𝜃(𝑥𝑖) ) Weighting factor 𝛼𝑖 : Inverse class frequency 사용 [1], square root of class frequency (smoothed version) [2] How can we design a better class-balanced loss that is applicable to a diverse array of datasets?
  • 6. 1. Research Background Objective 6/20 • Class별로 일정 샘플 수를 넘어서면, 새로 추가된 datapoint 가 갖는 이점이 감소한다. • Data overlap을 측정하는 새로운 이론적 프레임워크 (theoretical framework)를 제시해서, long-tailed dataset을 CNN에 학습시킬 때의 성능 저하를 새로운 loss function을 디자인함으로서 해결하는 것이 목적이다. Key contribution • The effective number of samples 이라는 새로운 이론적 프레임워크 제시 • Long-tailed dataset에 대한 CNN의 분류 성능 향상
  • 8. 2. Methods Effective number of samples S에서 더 많은 데이터를 샘플링 할수록, 샘플링한 데이터들은 S를 더 잘 나타낸다. 크기가 N인 feature space S Unit volumn이 1인 sampled data Newly sampled data가 Previously sampled data와 ‘겹친다 (p)’ , ’겹치지 않는다 (1-p)’ 의 두 경우만 고려했음 8/20
  • 9. 2. Methods Effective number of samples (𝐸 𝑛−1) Hyperparameter 𝛽 ∈ 0,1 : 𝑛이 증가함에 따라 𝐸 𝑛이 얼마나 빨리 증가할지 결정 9/20
  • 10. 2. Methods Class-Balanced Loss의 정의 10/20 • 각 class마다의 weighting factor 𝒂𝒊 를 𝒂𝒊 ∝ 𝟏/𝑬 𝒏 𝒊 로 정의함 Figure 3. Visualization of the proposed class-balanced term. Both axes are in log scale. For a long-tailed dataset where major classes have significantly more samples than minor classes, setting beta properly re-balances the relative loss across classes and reduces the drastic imbalance of re-weighing by inverse class frequency.
  • 11. 2. Methods Class-Balanced Loss의 정의 11/20 • The class-balanced softmax cross-entropy loss : • The class-balanced focal loss : • The class-balanced sigmoid cross-entropy loss : sigmoid Z=predicted output, y=label
  • 12. 2. Methods Dataset – Long-Tailed CIFAR 12/20 Imbalance factor 𝑛 = 𝑛𝑖 𝜇 𝑖 𝑛𝑖= original number of training images 𝜇𝑖 ∈ (0,1) • Class-Balanced loss 사용시 Long-tailed CIFAR 에 대한 성능 향상이 나타남. • 대부분의 경우 Sigmoid cross-entropy (SGM) 와 focal loss가 Softmax cross-entropy(SM)보다 좋은 성능을 보임 • 다음의 함수에 따라 training sample의 수를 줄임
  • 13. 2. Methods Implementation 13/20 • Dataset • Model • ResNet-32 (for CIFAR-10, CIFAR-100) • ResNet-50, ResNet-101, ResNet-152 (for iNaturalist, Imagenet) • Hyperparameters • Models on CIFAR were trained with batch size of 128 on a single NVIDIA Titan X GPU for 200 epochs. Models on iNaturalist and ILSVRC 2012 data were trained with batch size of 1024 on a single cloud TPU for 90 epochs. • Optimizer: SGD with momentum (learning rate decay was used) • Initial learning rate: 0.1 (CIFAR), 0.2 (Imagenet), 0.4 (iNaturalist)
  • 15. 3. Experimental Results Visual recognition on Long-Tailed CIFAR 15/20 β ∈ 0.9, 0.99, 0.999, 0.9999 , and 𝛾 ∈ 0.5, 1.0, 2.0 for focal loss. • Class-Balanced loss 사용시 Long-tailed CIFAR 에 대한 성능 향상이 나타남. • 대부분의 경우 Sigmoid cross-entropy (SGM) 와 focal loss가 Softmax cross-entropy(SM)보다 좋은 성능을 보임
  • 16. 3. Experimental Results Visual recognition on Long-Tailed CIFAR 16/20 • Class의 수가 많은 long-tailed dataset을 사용했을 때, 이 논문에서 설계한 class balanced loss가 기존 방법에 비해 우수함
  • 17. 3. Experimental Results Visual Recognition on Large-Scale Datasets 17/20 • CB Focal loss를 활용했을 때 모든 large-scale dataset 에 대해 좋은 성능을 나타냈다.
  • 18. 3. Experimental Results Visual Recognition on Large-Scale Datasets 18/20 • CB Focal loss를 활용했을 때 60 epochs 이후부터 성능에서의 장점이 나타났다.
  • 20. 4. Conclusions 20/20 Thank you. • 이 논문은 Trainind data의 long-tail distribution 문제를 해결하기 위한 theoretical framework를 제시했다. • Data Distribution를 더 잘 알면 effective number 를 더 잘 추정할 수 있을 것이기 때문에, 향후 data distribution에 대한 합리적인 가정을 추가하는 방식으로 이 framework 의 성능을 높일 계획이다. • 저자들의 Key idea는 각 class에 해당하는 data가 feature space에서 겹친 다는 것을 고려해 Effective number of sample을 제안하고, 이를 바탕으로 class-balanced loss를 제안한 것이다. • 저자들은 제안된 class balanced loss의 장점을 iNaturalist, ImageNet, CIFAR 등 다양한 규모의 dataset 에 대한 실험을 통해 검증했다.