PR-203: Class-Balanced Loss Based on Effective Number of Samples

Class-Balanced Loss Based on Effective Number of Samples
PR-203
Yin Cui1,2*, Menglin Jia1, Tsung-Yi Lin3, Yang Song4, and Serge Belongie1,4
1 Cornell University
2 Cornell University
3 Google Brain
4 Alphabet Inc.
CVPR 2019
주성훈, 삼성SDS AI선행연구Lab.
2019. 10. 27.

1. Research Background
Introduction
3/20
• What & Why
• CNN기반의 Visual recognition task에서,
long-tailed data distribution 문제를 해결하는 것이 중요해지고 있다
• 대부분의 real-world dataset은 skewed distribution 을 가지고 있음
• Long-tailed data 를 학습한 CNN 모델이 샘플 수가 적은 클래스에 대해
상대적으로 안좋은 성능을 나타냄
Figure 1. Two classes, one from the head and one from the tail
of a long-tailed dataset (iNaturalist 2017 in this example), have
drastically different number of samples.

Previous works
4/20
• Long-tailed data 에서의 성능을 높이려는 시도들
https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets
Chawla, N. V. et al. (2011). SMOTE: Synthetic Minority Over-sampling Technique.
He, H. et al. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning.
모델의 Overfitting 유발
성능 저하
• Re-sampling
Under-sampling Over-sampling
Synthesizing elements for the minority class (SMOTE, ADASYN)
생성된 샘플의 노이즈로 인한
모델 성능 저하

Previous works
5/20
• Long-tailed data 에서의 성능을 높이려는 시도들
[1] Golbon-Haghighi et al. (2016). Learning to model the tail
[2] Mahajan, D. et al. (2018). Exploring the Limits of Weakly Supervised Pretraining.
• Cost-sensitive re-weighting
작은 클래스에 대한 예측결과가 더 큰 영향을 미치도록 loss function을 바꿈
ℒ = − ෍
𝑖=1
𝑛
𝛼𝑖 𝑦𝑖 log(𝑆 𝑓𝜃(𝑥𝑖) )
Weighting factor 𝛼𝑖 :
Inverse class frequency 사용 [1], square root of class frequency (smoothed version) [2]
How can we design a better class-balanced loss that is applicable to a
diverse array of datasets?

Objective
6/20
• Class별로 일정 샘플 수를 넘어서면, 새로 추가된 datapoint 가 갖는 이점이 감소한다.
• Data overlap을 측정하는 새로운 이론적 프레임워크 (theoretical framework)를 제시해서,
long-tailed dataset을 CNN에 학습시킬 때의 성능 저하를 새로운 loss function을 디자인함으로서 해결하는
것이 목적이다.
Key contribution
• The effective number of samples 이라는 새로운 이론적 프레임워크 제시
• Long-tailed dataset에 대한 CNN의 분류 성능 향상

2. Methods
Effective number of samples
S에서 더 많은 데이터를 샘플링 할수록,
샘플링한 데이터들은 S를 더 잘 나타낸다.
크기가 N인 feature space S Unit volumn이 1인 sampled data
Newly sampled data가 Previously sampled data와 ‘겹친다 (p)’ , ’겹치지 않는다 (1-p)’ 의 두 경우만 고려했음
8/20

2. Methods
Effective number of samples
(𝐸 𝑛−1)
Hyperparameter 𝛽 ∈ 0,1 : 𝑛이 증가함에 따라 𝐸 𝑛이 얼마나 빨리 증가할지 결정
9/20

2. Methods
Class-Balanced Loss의 정의
10/20
• 각 class마다의 weighting factor 𝒂𝒊 를 𝒂𝒊 ∝ 𝟏/𝑬 𝒏 𝒊
로 정의함
Figure 3. Visualization of the proposed class-balanced term. Both axes are in log scale. For a long-tailed dataset
where major classes have significantly more samples than minor classes, setting beta properly re-balances the
relative loss across classes and reduces the drastic imbalance of re-weighing by inverse class frequency.

2. Methods
Class-Balanced Loss의 정의
11/20
• The class-balanced softmax cross-entropy loss :
• The class-balanced focal loss :
• The class-balanced sigmoid cross-entropy loss :
sigmoid
Z=predicted output, y=label

2. Methods
Dataset – Long-Tailed CIFAR
12/20
Imbalance factor
𝑛 = 𝑛𝑖 𝜇 𝑖
𝑛𝑖= original number of training images
𝜇𝑖 ∈ (0,1)
• Class-Balanced loss 사용시 Long-tailed CIFAR 에 대한 성능 향상이 나타남.
• 대부분의 경우 Sigmoid cross-entropy (SGM) 와 focal loss가 Softmax cross-entropy(SM)보다 좋은 성능을 보임
• 다음의 함수에 따라 training sample의 수를 줄임

2. Methods
Implementation
13/20
• Dataset
• Model
• ResNet-32 (for CIFAR-10, CIFAR-100)
• ResNet-50, ResNet-101, ResNet-152 (for iNaturalist, Imagenet)
• Hyperparameters
• Models on CIFAR were trained with batch size of 128 on a single NVIDIA Titan X GPU for 200 epochs.
Models on iNaturalist and ILSVRC 2012 data were trained with batch size of 1024 on a single cloud TPU for
90 epochs.
• Optimizer: SGD with momentum (learning rate decay was used)
• Initial learning rate: 0.1 (CIFAR), 0.2 (Imagenet), 0.4 (iNaturalist)

3. Experimental Results
Visual recognition on Long-Tailed CIFAR
15/20
β ∈ 0.9, 0.99, 0.999, 0.9999 , and 𝛾 ∈ 0.5, 1.0, 2.0 for focal loss.
• Class-Balanced loss 사용시 Long-tailed CIFAR 에 대한 성능 향상이 나타남.
• 대부분의 경우 Sigmoid cross-entropy (SGM) 와 focal loss가 Softmax cross-entropy(SM)보다 좋은 성능을 보임

Visual recognition on Long-Tailed CIFAR
16/20
• Class의 수가 많은 long-tailed dataset을 사용했을 때, 이 논문에서 설계한 class balanced loss가 기존 방법에 비해 우수함

Visual Recognition on Large-Scale Datasets
17/20
• CB Focal loss를 활용했을 때 모든 large-scale dataset 에 대해 좋은 성능을 나타냈다.

Visual Recognition on Large-Scale Datasets
18/20
• CB Focal loss를 활용했을 때 60 epochs 이후부터 성능에서의 장점이 나타났다.

4. Conclusions 20/20
Thank you.
• 이 논문은 Trainind data의 long-tail distribution 문제를 해결하기 위한
theoretical framework를 제시했다.
• Data Distribution를 더 잘 알면 effective number 를 더 잘 추정할 수 있을
것이기 때문에, 향후 data distribution에 대한 합리적인 가정을 추가하는
방식으로 이 framework 의 성능을 높일 계획이다.
• 저자들의 Key idea는 각 class에 해당하는 data가 feature space에서 겹친
다는 것을 고려해 Effective number of sample을 제안하고, 이를 바탕으로
class-balanced loss를 제안한 것이다.
• 저자들은 제안된 class balanced loss의 장점을 iNaturalist, ImageNet,
CIFAR 등 다양한 규모의 dataset 에 대한 실험을 통해 검증했다.

PR-203: Class-Balanced Loss Based on Effective Number of Samples

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PR-203: Class-Balanced Loss Based on Effective Number of Samples

Similar to PR-203: Class-Balanced Loss Based on Effective Number of Samples (20)

More from Sunghoon Joo

More from Sunghoon Joo (17)

PR-203: Class-Balanced Loss Based on Effective Number of Samples