SlideShare a Scribd company logo
1 of 24
Download to read offline
PR-339
CVPR 2020
주성훈, VUNO Inc.
2021. 8. 15.
1. Research Background
2. Methods
1. Research Background 3
Class incremental learning
많은 real-world application에서, streaming data로부터 점진적으로 새로운 class들을 학습해야 하는
경우가 있는데, 이를 class incremental learning이라고 한다.
A vanilla method for class incremental learning.
/ 24
2. Methods
1. Research Background 4
• 일반적으로 vanilla incremental learning으로 학습된 모델은 새로 추가된 class로 예측하려는 경향이 있다.
Motivation - catastrophic forgetting issue
/ 24
2. Methods
1. Research Background 5
Previous works - catastrophic forgetting 해결
Old data의 일부를 활용
(Old exemplar)
Old data를 생성하는 모델을 활용
(Generative model)
Old data를 학습한 모델을 활용
(Knowledge distillation)
Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다.
/ 24
2. Methods
1. Research Background 6
Previous works - catastrophic forgetting 해결
Old data의 일부를 활용
(Old exemplar)
Old data를 생성하는 모델을 활용
(Generative model)
Old data를 학습한 모델을 활용
(Knowledge distillation)
Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다.
Figure from: Belouadah, Eden, and Adrian Popescu. "Il2m: Class incremental
learning with dual memory." ICCV 2019.
Class imbalance를 해결하는
기술 활용
Old exemplar와 new data간의
class imbalance 문제
/ 24
2. Methods
1. Research Background 7
Previous works - catastrophic forgetting 해결
Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다.
[iCaRL] Rebuffi, Sylvestre-Alvise, et al. "icarl: Incremental classifier and representation learning." CVPR 2017.
[IL2M] Belouadah, Eden, and Adrian Popescu. "Il2m: Class incremental learning with dual memory." ICCV 2019
[BiC] Wu, Yue, et al. "Large scale incremental learning." CVPR 2019.
• Training 후 Bias correction layer를 이용한
model output 수정
BiC (CVPR 2019)
Old data의 일부를 활용
Old exemplar + class imbalance
Old data를 생성하는 모델을 활용
(Generative model)
Old data를 학습한 모델을 활용
(Knowledge distillation)
• Old exemplar를 활용해 catastrophic
forgetting을 해결한 첫 시도
• Nearest Class Mean (NCM): old
exemplar의 average feature vector
를 이용해 class imbalance 보완
iCaRL (CVPR 2017) IL2M (ICCV 2019)
• Incremental Learning with
Dual Memory
• Dual Memory: old image &
과거 모델의 class statistics
• probability calibration
method
/ 24
2. Methods
1. Research Background 8
Previous works - catastrophic forgetting 해결
Ostapenko, Oleksiy, et al. "Learning to remember: A synaptic plasticity driven
framework for continual learning." CVPR 2019
Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다.
Old data의 일부를 활용
Old exemplar + class imbalance
Old data를 생성하는 모델을 활용
(Generative model)
•Old data가 없어도 활용 가능한 방법
•Generative model의 품질에 매우 크게 의존
•Generative model을 학습시키기 위한 시간과 많은 dataset이 필요
Old data를 학습한 모델을 활용
(Knowledge distillation)
/ 24
2. Methods
1. Research Background 9
Previous works - catastrophic forgetting 해결
Figure from: Ostapenko, Oleksiy, et al. "Learning to remember: A synaptic
plasticity driven framework for continual learning." CVPR 2019
Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다.
Old data의 일부를 활용
Old exemplar + class imbalance
Old data를 생성하는 모델을 활용
(Generative model)
•단독으로 활용되면 성능 향상 효과가 미미함
Old data를 학습한 모델을 활용
(Knowledge distillation)
λ =
/ 24
2. Methods
1. Research Background 10
Objective
• we propose a simple and effective solution motivated by the aforementioned observations to address
catastrophic forgetting.
/ 24
2. Methods
2. Methods
2. Methods 12
Approach: Knowledge Distilation (KD) 과 Weight Aligning (WA) 적용
/ 24
2. Methods
2. Methods 13
Knowledge distillation 만으로는 왜 안될까?
test set : 10,000
- old part : 8,000 (80 classes)
- new part : 2,000 (20 classes)
•CIFAR-100, 5 incremental steps, 20 classes per step
• e(o,n) : old part를 new class로 잘못 분류한 경우 (KD로 인해 악화됨)
• e(o,o) : old part를 정답이 아닌 old class로 잘못 분류한 경우 (KD로 인해 개선됨)
• After revisiting the distillation loss, we find the cost of misclassifying old samples to new classes is
smaller than that to other old classes.
/ 24
2. Methods
2. Methods 14
Class incremental learning에서 final FC layer는 어떻게 될까?
1) final FC layer의 Norm of weight vectors 확인
• 새로운 class에 대한 weight vector가 old class의 것 보다 더 크더라
• 새로운 class에 대한 output logits가 더 커지게 됨 (성능이 떨어지는 주요 원인)
• 이 biased weights를 수정해보자
: -dimensional vector
o(x) (Cb
old + Cb
)
: feature extraction function
ϕ( ⋅ )
is a -dimensional weight vector for the class.
wc d cth
W = {wc , 1 ≤ c ≤ Cb
old + Cb
}
/ 24
2. Methods
2. Methods 15
2) 새로운 class에 대한 weight norms 재조정
Main contribution : Weight Aligning을 활용한 Maintaining Fairness phase 디자인
/ 24
3. Experimental Results
2. Methods
3. Experimental Results 17
CIFAR-100: 32 × 32 pixel color images. 500 images x 100 classes for training; 100 images x 100 class for evaluating
model: 32-layer ResNet, optimizer: SGD, batch size: 32. The learning rate starts from 0.1 and reduces to 1/10 of the
previous learning rate after 100, 150 and 200 epochs (250 epochs in total). We set the temperature scalar T to 2.
Data augmentation: random cropping, horizontal flip and normalization
CIFAR-100 class incremental learning task에 Weight Aligning (WA) 적용
KD: Knowledge distillation
WA: Weight Aligning
WNL: Weight normalization layer
1) WA 적용이 class incremental learning의 성능을 크게 향상시켰다.
2) KD와 WA를 조합하면 더 좋다.
/ 24
2. Methods
3. Experimental Results 18
KD, WA가 어떤 역할을 하는지 확인
KD: Old class의 오분류를 줄임.
여전히 새로 추가된 class로 분류.
WA: 모델이 new classes 와 old
classes 를 동등하게 다루도록 함
/ 24
2. Methods
3. Experimental Results 19
다른 방법들과 성능 비교
1) 제안한 방법이 ImageNet에서 좋은 성능을 보임
• LwF.MC (2017): KD
• iCaRL (2017): Old exemplar, Nearest Class Mean
• EEIL (2018): KD, balanced fine-tuning
• BiC (2019): Old exemplar. Bias correction layer
• IL2M (2019): dual memory (Old exemplar, prior class
statistics)
• 10 incremental steps
• 1.2 million images for training and 50,000 images for
validation
• We store 2,000 and 20,000 images for old classes in
ImageNet-100 and 1000, respectively.
model: 18-layer ResNet, optimizer: SGD, batch size: 256.
/ 24
2. Methods
3. Experimental Results 20
다른 방법들과 성능 비교
2) CIFAR100에서 좋은 성능을 보임
• we store 2,000 samples in total as the same as
previous work.
• result: 1st step을 제외한 모든 incremental step 결과
의 평균
• LwF.MC (2017): KD
• iCaRL (2017): Old exemplar, Nearest Class Mean
• EEIL (2018): KD, balanced fine-tuning
• BiC (2019): Old exemplar. Bias correction layer
• IL2M (2019): dual memory (Old exemplar, prior class
statistics)
/ 24
2. Methods
3. Experimental Results 21
Abblation study
• FC layer weight 제한: positive value로 weight restriction을 했을 때
성능이 더 향상됨 (weight norm과 output logits를 일관성있게 만듦)
• Norm selection: 차이 없음
• bias term: 약간의 영향
• exemplar selection (모범적인 선택 전략) : 약간의 영향
ImageNet-100 with 10 incremental steps
/ 24
2. Methods
3. Experimental Results 22
Weight 조정 방법에 따른 성능 차이
KD: Knowledge distillation
WA: Weight Aligning
WNL: Weight normalization layer
1) Weight normalization은 training과정에서 적용되면 새로운 데이터에 편향되는 경향이 더 심하게 나타난다.
2) 이 연구에서 활용된 후처리 방식으로 weight를 조정하는 것이 class imbalance에 효과가 있다.
/ 24
4. Conclusion
2. Methods
4. Conclusions 24
Thank you.
• 이 논문에서는 Class incremental learning에서의 catastrophic forgetting 문제를 해결하기 위한 간단한 방법을 제시했
다.
• Knowledge distillation과 weight aligning을 동시에 활용했다.
• Weight aligning이 성능 향상에 주요한 원인이었고, Knowledge distillation과 동시에 활용될 때 추가적인 성능 향상이 있었다.
• Knowledge distillation은 old classes 안에서 잘 구분할 수 있게 하는 역할을 (Maintaining Discrimination),
Weight aligning은 old classes와 new classes를 동등하게 다루는 역할을 해 (Maintaining Fairness) catastrophic
forgetting을 완화했다.
• 새로 제시한 방법으로 ImageNet-1000, ImageNet-100, CIFAR-100에 대한 class incremental learning task를 시도한
결과, 기존 방법보다 더 나은 성능을 보였다.
• 논문의 결과들은 학습이 완료된 모델에서 더 많은 가치들을 찾아낼 수 있다는 가능성을 의미한다.
24
24/ 24

More Related Content

What's hot

Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...준식 최
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
Goodfellow先生おすすめのGAN論文6つを紹介
Goodfellow先生おすすめのGAN論文6つを紹介Goodfellow先生おすすめのGAN論文6つを紹介
Goodfellow先生おすすめのGAN論文6つを紹介Katsuya Ito
 
PR-328: End-to-End Optimized Image Compression
PR-328: End-to-End OptimizedImage CompressionPR-328: End-to-End OptimizedImage Compression
PR-328: End-to-End Optimized Image CompressionHyeongmin Lee
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance홍배 김
 
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기taeseon ryu
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
Cholesky method and Thomas
Cholesky method and ThomasCholesky method and Thomas
Cholesky method and Thomasjorgeduardooo
 
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...광희 이
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Segment Anything
Segment AnythingSegment Anything
Segment Anythingfake can
 
[DL輪読会]“Learning to Predict without Looking Ahead: World Models without Forwa...
[DL輪読会]“Learning to Predict without Looking Ahead: World Models without Forwa...[DL輪読会]“Learning to Predict without Looking Ahead: World Models without Forwa...
[DL輪読会]“Learning to Predict without Looking Ahead: World Models without Forwa...Deep Learning JP
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillationNAVER Engineering
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksBennoG1
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
 
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image SynthesisDeep Learning JP
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 reviewtaeseon ryu
 

What's hot (20)

Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Goodfellow先生おすすめのGAN論文6つを紹介
Goodfellow先生おすすめのGAN論文6つを紹介Goodfellow先生おすすめのGAN論文6つを紹介
Goodfellow先生おすすめのGAN論文6つを紹介
 
PR-328: End-to-End Optimized Image Compression
PR-328: End-to-End OptimizedImage CompressionPR-328: End-to-End OptimizedImage Compression
PR-328: End-to-End Optimized Image Compression
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Cholesky method and Thomas
Cholesky method and ThomasCholesky method and Thomas
Cholesky method and Thomas
 
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Segment Anything
Segment AnythingSegment Anything
Segment Anything
 
[DL輪読会]“Learning to Predict without Looking Ahead: World Models without Forwa...
[DL輪読会]“Learning to Predict without Looking Ahead: World Models without Forwa...[DL輪読会]“Learning to Predict without Looking Ahead: World Models without Forwa...
[DL輪読会]“Learning to Predict without Looking Ahead: World Models without Forwa...
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 review
 

Similar to PR-339: Maintaining discrimination and fairness in class incremental learning

PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...Sunghoon Joo
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDong Heon Cho
 
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceSungchul Kim
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰taeseon ryu
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...LEE HOSEONG
 
(in Korean) about knowledge distillation
(in Korean) about knowledge distillation(in Korean) about knowledge distillation
(in Korean) about knowledge distillationssuser23ed0c
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiionSubin An
 
Learning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttributionLearning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttributionGyubin Son
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationLEE HOSEONG
 
Denoising auto encoders(d a)
Denoising auto encoders(d a)Denoising auto encoders(d a)
Denoising auto encoders(d a)Tae Young Lee
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Jinwon Lee
 
180212 normalization hyu_dake
180212 normalization hyu_dake180212 normalization hyu_dake
180212 normalization hyu_dakeDongGyun Hong
 

Similar to PR-339: Maintaining discrimination and fairness in class incremental learning (16)

PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other Models
 
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
 
Review EDSR
Review EDSRReview EDSR
Review EDSR
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...
 
(in Korean) about knowledge distillation
(in Korean) about knowledge distillation(in Korean) about knowledge distillation
(in Korean) about knowledge distillation
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiion
 
Learning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttributionLearning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttribution
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
Denoising auto encoders(d a)
Denoising auto encoders(d a)Denoising auto encoders(d a)
Denoising auto encoders(d a)
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 
230720_NS
230720_NS230720_NS
230720_NS
 
180212 normalization hyu_dake
180212 normalization hyu_dake180212 normalization hyu_dake
180212 normalization hyu_dake
 

More from Sunghoon Joo

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterSunghoon Joo
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...Sunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionSunghoon Joo
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningSunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...Sunghoon Joo
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingSunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesSunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchSunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...Sunghoon Joo
 

More from Sunghoon Joo (18)

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked Autoencoders
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

PR-339: Maintaining discrimination and fairness in class incremental learning

  • 3. 2. Methods 1. Research Background 3 Class incremental learning 많은 real-world application에서, streaming data로부터 점진적으로 새로운 class들을 학습해야 하는 경우가 있는데, 이를 class incremental learning이라고 한다. A vanilla method for class incremental learning. / 24
  • 4. 2. Methods 1. Research Background 4 • 일반적으로 vanilla incremental learning으로 학습된 모델은 새로 추가된 class로 예측하려는 경향이 있다. Motivation - catastrophic forgetting issue / 24
  • 5. 2. Methods 1. Research Background 5 Previous works - catastrophic forgetting 해결 Old data의 일부를 활용 (Old exemplar) Old data를 생성하는 모델을 활용 (Generative model) Old data를 학습한 모델을 활용 (Knowledge distillation) Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다. / 24
  • 6. 2. Methods 1. Research Background 6 Previous works - catastrophic forgetting 해결 Old data의 일부를 활용 (Old exemplar) Old data를 생성하는 모델을 활용 (Generative model) Old data를 학습한 모델을 활용 (Knowledge distillation) Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다. Figure from: Belouadah, Eden, and Adrian Popescu. "Il2m: Class incremental learning with dual memory." ICCV 2019. Class imbalance를 해결하는 기술 활용 Old exemplar와 new data간의 class imbalance 문제 / 24
  • 7. 2. Methods 1. Research Background 7 Previous works - catastrophic forgetting 해결 Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다. [iCaRL] Rebuffi, Sylvestre-Alvise, et al. "icarl: Incremental classifier and representation learning." CVPR 2017. [IL2M] Belouadah, Eden, and Adrian Popescu. "Il2m: Class incremental learning with dual memory." ICCV 2019 [BiC] Wu, Yue, et al. "Large scale incremental learning." CVPR 2019. • Training 후 Bias correction layer를 이용한 model output 수정 BiC (CVPR 2019) Old data의 일부를 활용 Old exemplar + class imbalance Old data를 생성하는 모델을 활용 (Generative model) Old data를 학습한 모델을 활용 (Knowledge distillation) • Old exemplar를 활용해 catastrophic forgetting을 해결한 첫 시도 • Nearest Class Mean (NCM): old exemplar의 average feature vector 를 이용해 class imbalance 보완 iCaRL (CVPR 2017) IL2M (ICCV 2019) • Incremental Learning with Dual Memory • Dual Memory: old image & 과거 모델의 class statistics • probability calibration method / 24
  • 8. 2. Methods 1. Research Background 8 Previous works - catastrophic forgetting 해결 Ostapenko, Oleksiy, et al. "Learning to remember: A synaptic plasticity driven framework for continual learning." CVPR 2019 Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다. Old data의 일부를 활용 Old exemplar + class imbalance Old data를 생성하는 모델을 활용 (Generative model) •Old data가 없어도 활용 가능한 방법 •Generative model의 품질에 매우 크게 의존 •Generative model을 학습시키기 위한 시간과 많은 dataset이 필요 Old data를 학습한 모델을 활용 (Knowledge distillation) / 24
  • 9. 2. Methods 1. Research Background 9 Previous works - catastrophic forgetting 해결 Figure from: Ostapenko, Oleksiy, et al. "Learning to remember: A synaptic plasticity driven framework for continual learning." CVPR 2019 Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다. Old data의 일부를 활용 Old exemplar + class imbalance Old data를 생성하는 모델을 활용 (Generative model) •단독으로 활용되면 성능 향상 효과가 미미함 Old data를 학습한 모델을 활용 (Knowledge distillation) λ = / 24
  • 10. 2. Methods 1. Research Background 10 Objective • we propose a simple and effective solution motivated by the aforementioned observations to address catastrophic forgetting. / 24
  • 12. 2. Methods 2. Methods 12 Approach: Knowledge Distilation (KD) 과 Weight Aligning (WA) 적용 / 24
  • 13. 2. Methods 2. Methods 13 Knowledge distillation 만으로는 왜 안될까? test set : 10,000 - old part : 8,000 (80 classes) - new part : 2,000 (20 classes) •CIFAR-100, 5 incremental steps, 20 classes per step • e(o,n) : old part를 new class로 잘못 분류한 경우 (KD로 인해 악화됨) • e(o,o) : old part를 정답이 아닌 old class로 잘못 분류한 경우 (KD로 인해 개선됨) • After revisiting the distillation loss, we find the cost of misclassifying old samples to new classes is smaller than that to other old classes. / 24
  • 14. 2. Methods 2. Methods 14 Class incremental learning에서 final FC layer는 어떻게 될까? 1) final FC layer의 Norm of weight vectors 확인 • 새로운 class에 대한 weight vector가 old class의 것 보다 더 크더라 • 새로운 class에 대한 output logits가 더 커지게 됨 (성능이 떨어지는 주요 원인) • 이 biased weights를 수정해보자 : -dimensional vector o(x) (Cb old + Cb ) : feature extraction function ϕ( ⋅ ) is a -dimensional weight vector for the class. wc d cth W = {wc , 1 ≤ c ≤ Cb old + Cb } / 24
  • 15. 2. Methods 2. Methods 15 2) 새로운 class에 대한 weight norms 재조정 Main contribution : Weight Aligning을 활용한 Maintaining Fairness phase 디자인 / 24
  • 17. 2. Methods 3. Experimental Results 17 CIFAR-100: 32 × 32 pixel color images. 500 images x 100 classes for training; 100 images x 100 class for evaluating model: 32-layer ResNet, optimizer: SGD, batch size: 32. The learning rate starts from 0.1 and reduces to 1/10 of the previous learning rate after 100, 150 and 200 epochs (250 epochs in total). We set the temperature scalar T to 2. Data augmentation: random cropping, horizontal flip and normalization CIFAR-100 class incremental learning task에 Weight Aligning (WA) 적용 KD: Knowledge distillation WA: Weight Aligning WNL: Weight normalization layer 1) WA 적용이 class incremental learning의 성능을 크게 향상시켰다. 2) KD와 WA를 조합하면 더 좋다. / 24
  • 18. 2. Methods 3. Experimental Results 18 KD, WA가 어떤 역할을 하는지 확인 KD: Old class의 오분류를 줄임. 여전히 새로 추가된 class로 분류. WA: 모델이 new classes 와 old classes 를 동등하게 다루도록 함 / 24
  • 19. 2. Methods 3. Experimental Results 19 다른 방법들과 성능 비교 1) 제안한 방법이 ImageNet에서 좋은 성능을 보임 • LwF.MC (2017): KD • iCaRL (2017): Old exemplar, Nearest Class Mean • EEIL (2018): KD, balanced fine-tuning • BiC (2019): Old exemplar. Bias correction layer • IL2M (2019): dual memory (Old exemplar, prior class statistics) • 10 incremental steps • 1.2 million images for training and 50,000 images for validation • We store 2,000 and 20,000 images for old classes in ImageNet-100 and 1000, respectively. model: 18-layer ResNet, optimizer: SGD, batch size: 256. / 24
  • 20. 2. Methods 3. Experimental Results 20 다른 방법들과 성능 비교 2) CIFAR100에서 좋은 성능을 보임 • we store 2,000 samples in total as the same as previous work. • result: 1st step을 제외한 모든 incremental step 결과 의 평균 • LwF.MC (2017): KD • iCaRL (2017): Old exemplar, Nearest Class Mean • EEIL (2018): KD, balanced fine-tuning • BiC (2019): Old exemplar. Bias correction layer • IL2M (2019): dual memory (Old exemplar, prior class statistics) / 24
  • 21. 2. Methods 3. Experimental Results 21 Abblation study • FC layer weight 제한: positive value로 weight restriction을 했을 때 성능이 더 향상됨 (weight norm과 output logits를 일관성있게 만듦) • Norm selection: 차이 없음 • bias term: 약간의 영향 • exemplar selection (모범적인 선택 전략) : 약간의 영향 ImageNet-100 with 10 incremental steps / 24
  • 22. 2. Methods 3. Experimental Results 22 Weight 조정 방법에 따른 성능 차이 KD: Knowledge distillation WA: Weight Aligning WNL: Weight normalization layer 1) Weight normalization은 training과정에서 적용되면 새로운 데이터에 편향되는 경향이 더 심하게 나타난다. 2) 이 연구에서 활용된 후처리 방식으로 weight를 조정하는 것이 class imbalance에 효과가 있다. / 24
  • 24. 2. Methods 4. Conclusions 24 Thank you. • 이 논문에서는 Class incremental learning에서의 catastrophic forgetting 문제를 해결하기 위한 간단한 방법을 제시했 다. • Knowledge distillation과 weight aligning을 동시에 활용했다. • Weight aligning이 성능 향상에 주요한 원인이었고, Knowledge distillation과 동시에 활용될 때 추가적인 성능 향상이 있었다. • Knowledge distillation은 old classes 안에서 잘 구분할 수 있게 하는 역할을 (Maintaining Discrimination), Weight aligning은 old classes와 new classes를 동등하게 다루는 역할을 해 (Maintaining Fairness) catastrophic forgetting을 완화했다. • 새로 제시한 방법으로 ImageNet-1000, ImageNet-100, CIFAR-100에 대한 class incremental learning task를 시도한 결과, 기존 방법보다 더 나은 성능을 보였다. • 논문의 결과들은 학습이 완료된 모델에서 더 많은 가치들을 찾아낼 수 있다는 가능성을 의미한다. 24 24/ 24