SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
MFAS: Multimodal Fusion Architecture Search
PR-218
Juan-Manuel Pérez-Rúa1,3*, Valentin Vielzeuf1,2*, Stéphane Pateux1, Moez Baccouche1, and
Frédéric Jurie2
1 Orange Labs, Cesson-Sévigné, France
2 Université Caen Normandie, France
3 Samsung AI Centre, Cambridge, UK
CVPR 2019
주성훈, 삼성SDS AI선행연구Lab.
2020. 1. 12.
1. Research Background
1. Research Background
Introduction
3/20
• What & Why
• Multimodal classification problems 을 잘 해결하기 위한 fusion architecture를 찾는 방법에 관한 연구
최초등록일
이미지 기타 차량 정보 소리
엔진음
배기음
연식
주행거리
모델
배출가스
사고이력
거래이력
중고차 가격
1. Research Background
Previous works
4/20
• Multimodal fusion approaches
Sharm, A. et al. (2015). EnhancingRGB CNNswith depth.
• Building best possible fusion architectures by finding at which depths the unimodal layers
should be fused.
• Late fusion, early fusion,
take advantage of both low-level and high-level features (model designer selection, attention mechanisms)
1. Research Background
Previous works
5/20
• Multimodal fusion approaches
Engilberge,M et al., (2018). Finding Beans in Burgers: Deep Semantic-VisualEmbedding with Localization. CVPR2018
Li, Fan el al. (2017). Modout: Learning Multi-Modal Architecturesby StochasticRegularization.
• To define constraints in order to control the relationship between unimodal features and/or
the structure of the weights.
기존 방법들은 Model designer의 높은 전문지식이 필요하거나,
아니면 간단한 방법 (late, early fusion)은 성능이 떨어지는 경우가 있다.
• Maximizing correlation between features, minimize their cosine distance,
modality dropping (defining fusion mask)
1. Research Background
Problem statement
6/20
• 적은 수의 연산으로 최선의 fusion architecture를 구하라
L번째 fusion layer에 들어오는 첫번째 modality 의 feature.
L번째 fusion layer에 들어오는 두번째 modality 의 feature.
L번째 fusion layer의 activation function
Search space: 가능한 fusion architecture의 집합, (M x N x P)L
Composed fusion scheme은 다음의 Vector of triple으로 표현됨
1. Research Background
Objective
7/20
• Sequential model-based optimization (SMBO) 를 이용한, 기존의 multimodal feature fusion 방식인 late fusion
보다 나은 fusion architecture를 찾는 방법 제안
Key contribution
• Optimal multimodal feature fusion 을 여러 database에서 검증
• Multimodal fusion problems의 search space를 정의했음
• 높은 정확도로 Deep multimodality를 융합하는 architecture를 찾는 문제에 automatic search approach를 적
용 가능하다는 점 확인
• Automatic fusion architecture search를 통해 찾은 3개의 모델
2. Methods
2. Methods
Multimodal fusion architecture search
목적 : sequential model-based optimization (SMBO) 적용,
모든 경우의 수를 검색하지 않고도 최상의 모델을 제안 [1]
9/20
[1] Chenxi Liu et al., (2018). Progressive Neural Architecture Search. ECCV2018
[2] Juan-Manuel Perez-Ruael al. (2018). EfficientProgressive Neural Architecture Search
Sampled fusion networks are trained sequentially for a small number of
epochs (Training epoch = 2)
[2]
3. Experimental Results
3. Experimental Results
Av-MNIST dataset을 활용한 실험
11/20
• x : The audio modality (112 x 112 spectrograms). 각 숫자에 대한 발음의 spectrograms
• y : 28 x 28 MNIST image. (with 75% of their energy removed by PCA)
f: LeNet-5 (M = 5)
g: LeNet-3 (N = 3)
P=(ReLU, Sigmoid) (P = 2)
Time
Frequency
Five 라고 발음하는 음성의 spectrogram
3. Experimental Results
Av-MNIST dataset을 활용한 실험
12/20
F, G : LeNet P=(ReLU, Sigmoid)
Structure of the [(5,3,1), (4,2,1), (5,3,1)] architectureValidation accuracy
Test set accuracy
• Random search 에 비해 높은 성능 (Accuracy, standard deviation)
• Multimodal fusion network이 unimodal network 에 비해 성능이 향상됨
3. Experimental Results
MM-IMDB dataset을 활용한 실험
13/20
• Movie poster (image, x) 와 Movie description (text, y) 으로 구성된 데이터셋
• Classification task: 영화 장르 예측 (전체 23 class: 드라마, 코미디, 다큐멘터리, 스포츠, 서부, 느와르 등등)
MM-IMDB dataset
• 이 방법으로 찾은 Multimodal fusion network이 Unimodal network 과 이전 fusion 방법들에 비해 높은 성능을 보임
f=VGG-19 (8 layers)
g=Maxout-MLP (2 layers)
P=(ReLU, Sigmoid, LeakyReLU)
NTU RGB+D dataset을 활용한 실험
Chao Li et al., (2018). Co-occurrencefeature learning from skeleton data for action recognition and detection with hierarchical aggregation. IJCAI 2018
NTU RGB+D dataset
3. Experimental Results 14/20
• Shahroudy et al., in 2016 CVPR
• With 56,880 samples, it is the largest color and depth multimodal dataset.
• Classification task: 60가지 행동 예측
(drinking, eating, falling down, hugging, shaking hands, punching 등)
f=deep co-occurrence network (4 layers)
g=Inflated ResNet-50 (4 layers)
P=(ReLU, Sigmoid, LeakyReLU)
Video & Skeleton sequence
3. Experimental Results 15/20
NTU RGB+D dataset을 활용한 실험
• 이 방법으로 찾은 Multimodal fusion network이 Unimodal network 과 이전 fusion 방법들에 비해 높은 성능을 보임
f=deep co-occurrence network (4 layers)
g=Inflated ResNet-50 (4 layers)
P=(ReLU, Sigmoid, LeakyReLU)
3. Experimental Results 16/20
성능과 네트워크 아키텍쳐의 상관관계
• 가장 성능이 좋은 아키텍쳐가 가장 큰 아키텍쳐는 아니다
3. Experimental Results 17/20
탐색 진행에 따른 네트워크 성능
• Sampled architectures are more and more stable error-wise as the search progresses
• Shared fusion weigths가 점점 정교해짐, Surrogate function이 잘 동작함
3. Experimental Results 18/20
아키텍쳐 탐색 시간
2. Methods
Class-Balanced Loss의 정의
10/20
• 각 class마다의 weighting factor 𝒂𝒊 를 𝒂𝒊 ∝ 𝟏/𝑬 𝒏 𝒊
로 정의함
Figure 3. Visualization of the proposed class-balanced term. Both axes are in log scale. For a long-tailed dataset
where major classes have significantly more samples than minor classes, setting beta properly re-balances the
relative loss across classes and reduces the drastic imbalance of re-weighing by inverse class frequency.
4. Conclusion
4. Conclusions 20/20
Thank you.
• 이 논문은 Multimodal classification을 위한 fusion architecture를 찾는 문
제를 해결하기 위한 theoretical framework를 제시했다.
• Future work은 현 방식의 Search space를 확장시켜 fusion layer의 조합을
더 유연하게 할 수 있는 방향으로 갈 것이다.
• 저자들이 제안한 multimodal search space는 다양한 구조가 나올 수 있을
만큼 복잡하면서도, 문제의 복잡도가 reasonable할 정도로 제한적이었다.
• 저자들은 제안된 Architecture search방법을 3개의 multimodal dataset을
통해 검증했다.

Contenu connexe

Tendances

[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement LearningKiho Suh
 
Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Korea, Sejong University.
 
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re..."Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re...LEE HOSEONG
 
I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)Susang Kim
 
Survey on Monocular Depth Estimation
Survey on Monocular Depth EstimationSurvey on Monocular Depth Estimation
Survey on Monocular Depth Estimation범준 김
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationLEE HOSEONG
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural networkDongyi Kim
 
네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLDKim Junghoon
 
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...MYEONGGYU LEE
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Jinwon Lee
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkrlawjdgns
 
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...taeseon ryu
 
AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetJungwon Kim
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksSanghoon Yoon
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Networkagdatalab
 
스마트폰 위의 딥러닝
스마트폰 위의 딥러닝스마트폰 위의 딥러닝
스마트폰 위의 딥러닝NAVER Engineering
 
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...jungminchung
 
Paper Reading : Learning to compose neural networks for question answering
Paper Reading : Learning to compose neural networks for question answeringPaper Reading : Learning to compose neural networks for question answering
Paper Reading : Learning to compose neural networks for question answeringSean Park
 

Tendances (20)

[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning
 
Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks
 
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re..."Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re...
 
I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)
 
Survey on Monocular Depth Estimation
Survey on Monocular Depth EstimationSurvey on Monocular Depth Estimation
Survey on Monocular Depth Estimation
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural network
 
네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD네트워크 경량화 이모저모 @ 2020 DLD
네트워크 경량화 이모저모 @ 2020 DLD
 
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
CNN
CNNCNN
CNN
 
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
 
AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, Resnet
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
 
스마트폰 위의 딥러닝
스마트폰 위의 딥러닝스마트폰 위의 딥러닝
스마트폰 위의 딥러닝
 
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
 
Paper Reading : Learning to compose neural networks for question answering
Paper Reading : Learning to compose neural networks for question answeringPaper Reading : Learning to compose neural networks for question answering
Paper Reading : Learning to compose neural networks for question answering
 

Similaire à PR-218: MFAS: Multimodal Fusion Architecture Search

20150331 msr outreach media_roundtable_deck_연세대강홍구교수_음성합성
20150331 msr outreach media_roundtable_deck_연세대강홍구교수_음성합성20150331 msr outreach media_roundtable_deck_연세대강홍구교수_음성합성
20150331 msr outreach media_roundtable_deck_연세대강홍구교수_음성합성Hye-rim Jang
 
180624 mobile visionnet_baeksucon_jwkang_pub
180624 mobile visionnet_baeksucon_jwkang_pub180624 mobile visionnet_baeksucon_jwkang_pub
180624 mobile visionnet_baeksucon_jwkang_pubJaewook. Kang
 
[Graduation Project] 전자석을 이용한 타자 연습기
[Graduation Project] 전자석을 이용한 타자 연습기[Graduation Project] 전자석을 이용한 타자 연습기
[Graduation Project] 전자석을 이용한 타자 연습기Junyoung Jung
 
초초초 (초고속 초저지연 초연결) 5G IoT 플랫폼 개발 이야기
초초초 (초고속 초저지연 초연결) 5G IoT 플랫폼 개발 이야기초초초 (초고속 초저지연 초연결) 5G IoT 플랫폼 개발 이야기
초초초 (초고속 초저지연 초연결) 5G IoT 플랫폼 개발 이야기ksdc2019
 
머신러닝의 개념과 실습
머신러닝의 개념과 실습머신러닝의 개념과 실습
머신러닝의 개념과 실습Byoung-Hee Kim
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Mad Scientists
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsseungwoo kim
 
앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발Jungkyu Lee
 
순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요Byoung-Hee Kim
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural networkNAVER Engineering
 
Image Deep Learning 실무적용
Image Deep Learning 실무적용Image Deep Learning 실무적용
Image Deep Learning 실무적용Youngjae Kim
 
황승원 포항공대 교수
황승원 포항공대 교수황승원 포항공대 교수
황승원 포항공대 교수Hye-rim Jang
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to ZLEE HOSEONG
 
[PR12] image super resolution using deep convolutional networks
[PR12] image super resolution using deep convolutional networks[PR12] image super resolution using deep convolutional networks
[PR12] image super resolution using deep convolutional networksTaegyun Jeon
 
Campus Network Analysis
Campus Network AnalysisCampus Network Analysis
Campus Network AnalysisEugine Kang
 
생체 광학 데이터 분석 AI 경진대회 2위 수상작
생체 광학 데이터 분석 AI 경진대회 2위 수상작생체 광학 데이터 분석 AI 경진대회 2위 수상작
생체 광학 데이터 분석 AI 경진대회 2위 수상작DACON AI 데이콘
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
 
Src슬라이드(1총괄1세부) 임요한
Src슬라이드(1총괄1세부) 임요한Src슬라이드(1총괄1세부) 임요한
Src슬라이드(1총괄1세부) 임요한SRCDSC
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection창기 문
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection창기 문
 

Similaire à PR-218: MFAS: Multimodal Fusion Architecture Search (20)

20150331 msr outreach media_roundtable_deck_연세대강홍구교수_음성합성
20150331 msr outreach media_roundtable_deck_연세대강홍구교수_음성합성20150331 msr outreach media_roundtable_deck_연세대강홍구교수_음성합성
20150331 msr outreach media_roundtable_deck_연세대강홍구교수_음성합성
 
180624 mobile visionnet_baeksucon_jwkang_pub
180624 mobile visionnet_baeksucon_jwkang_pub180624 mobile visionnet_baeksucon_jwkang_pub
180624 mobile visionnet_baeksucon_jwkang_pub
 
[Graduation Project] 전자석을 이용한 타자 연습기
[Graduation Project] 전자석을 이용한 타자 연습기[Graduation Project] 전자석을 이용한 타자 연습기
[Graduation Project] 전자석을 이용한 타자 연습기
 
초초초 (초고속 초저지연 초연결) 5G IoT 플랫폼 개발 이야기
초초초 (초고속 초저지연 초연결) 5G IoT 플랫폼 개발 이야기초초초 (초고속 초저지연 초연결) 5G IoT 플랫폼 개발 이야기
초초초 (초고속 초저지연 초연결) 5G IoT 플랫폼 개발 이야기
 
머신러닝의 개념과 실습
머신러닝의 개념과 실습머신러닝의 개념과 실습
머신러닝의 개념과 실습
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
 
앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발
 
순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural network
 
Image Deep Learning 실무적용
Image Deep Learning 실무적용Image Deep Learning 실무적용
Image Deep Learning 실무적용
 
황승원 포항공대 교수
황승원 포항공대 교수황승원 포항공대 교수
황승원 포항공대 교수
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to Z
 
[PR12] image super resolution using deep convolutional networks
[PR12] image super resolution using deep convolutional networks[PR12] image super resolution using deep convolutional networks
[PR12] image super resolution using deep convolutional networks
 
Campus Network Analysis
Campus Network AnalysisCampus Network Analysis
Campus Network Analysis
 
생체 광학 데이터 분석 AI 경진대회 2위 수상작
생체 광학 데이터 분석 AI 경진대회 2위 수상작생체 광학 데이터 분석 AI 경진대회 2위 수상작
생체 광학 데이터 분석 AI 경진대회 2위 수상작
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
 
Src슬라이드(1총괄1세부) 임요한
Src슬라이드(1총괄1세부) 임요한Src슬라이드(1총괄1세부) 임요한
Src슬라이드(1총괄1세부) 임요한
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
 

Plus de Sunghoon Joo

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterSunghoon Joo
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...Sunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionSunghoon Joo
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...Sunghoon Joo
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningSunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingSunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesSunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...Sunghoon Joo
 

Plus de Sunghoon Joo (17)

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked Autoencoders
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

PR-218: MFAS: Multimodal Fusion Architecture Search

  • 1. MFAS: Multimodal Fusion Architecture Search PR-218 Juan-Manuel Pérez-Rúa1,3*, Valentin Vielzeuf1,2*, Stéphane Pateux1, Moez Baccouche1, and Frédéric Jurie2 1 Orange Labs, Cesson-Sévigné, France 2 Université Caen Normandie, France 3 Samsung AI Centre, Cambridge, UK CVPR 2019 주성훈, 삼성SDS AI선행연구Lab. 2020. 1. 12.
  • 3. 1. Research Background Introduction 3/20 • What & Why • Multimodal classification problems 을 잘 해결하기 위한 fusion architecture를 찾는 방법에 관한 연구 최초등록일 이미지 기타 차량 정보 소리 엔진음 배기음 연식 주행거리 모델 배출가스 사고이력 거래이력 중고차 가격
  • 4. 1. Research Background Previous works 4/20 • Multimodal fusion approaches Sharm, A. et al. (2015). EnhancingRGB CNNswith depth. • Building best possible fusion architectures by finding at which depths the unimodal layers should be fused. • Late fusion, early fusion, take advantage of both low-level and high-level features (model designer selection, attention mechanisms)
  • 5. 1. Research Background Previous works 5/20 • Multimodal fusion approaches Engilberge,M et al., (2018). Finding Beans in Burgers: Deep Semantic-VisualEmbedding with Localization. CVPR2018 Li, Fan el al. (2017). Modout: Learning Multi-Modal Architecturesby StochasticRegularization. • To define constraints in order to control the relationship between unimodal features and/or the structure of the weights. 기존 방법들은 Model designer의 높은 전문지식이 필요하거나, 아니면 간단한 방법 (late, early fusion)은 성능이 떨어지는 경우가 있다. • Maximizing correlation between features, minimize their cosine distance, modality dropping (defining fusion mask)
  • 6. 1. Research Background Problem statement 6/20 • 적은 수의 연산으로 최선의 fusion architecture를 구하라 L번째 fusion layer에 들어오는 첫번째 modality 의 feature. L번째 fusion layer에 들어오는 두번째 modality 의 feature. L번째 fusion layer의 activation function Search space: 가능한 fusion architecture의 집합, (M x N x P)L Composed fusion scheme은 다음의 Vector of triple으로 표현됨
  • 7. 1. Research Background Objective 7/20 • Sequential model-based optimization (SMBO) 를 이용한, 기존의 multimodal feature fusion 방식인 late fusion 보다 나은 fusion architecture를 찾는 방법 제안 Key contribution • Optimal multimodal feature fusion 을 여러 database에서 검증 • Multimodal fusion problems의 search space를 정의했음 • 높은 정확도로 Deep multimodality를 융합하는 architecture를 찾는 문제에 automatic search approach를 적 용 가능하다는 점 확인 • Automatic fusion architecture search를 통해 찾은 3개의 모델
  • 9. 2. Methods Multimodal fusion architecture search 목적 : sequential model-based optimization (SMBO) 적용, 모든 경우의 수를 검색하지 않고도 최상의 모델을 제안 [1] 9/20 [1] Chenxi Liu et al., (2018). Progressive Neural Architecture Search. ECCV2018 [2] Juan-Manuel Perez-Ruael al. (2018). EfficientProgressive Neural Architecture Search Sampled fusion networks are trained sequentially for a small number of epochs (Training epoch = 2) [2]
  • 11. 3. Experimental Results Av-MNIST dataset을 활용한 실험 11/20 • x : The audio modality (112 x 112 spectrograms). 각 숫자에 대한 발음의 spectrograms • y : 28 x 28 MNIST image. (with 75% of their energy removed by PCA) f: LeNet-5 (M = 5) g: LeNet-3 (N = 3) P=(ReLU, Sigmoid) (P = 2) Time Frequency Five 라고 발음하는 음성의 spectrogram
  • 12. 3. Experimental Results Av-MNIST dataset을 활용한 실험 12/20 F, G : LeNet P=(ReLU, Sigmoid) Structure of the [(5,3,1), (4,2,1), (5,3,1)] architectureValidation accuracy Test set accuracy • Random search 에 비해 높은 성능 (Accuracy, standard deviation) • Multimodal fusion network이 unimodal network 에 비해 성능이 향상됨
  • 13. 3. Experimental Results MM-IMDB dataset을 활용한 실험 13/20 • Movie poster (image, x) 와 Movie description (text, y) 으로 구성된 데이터셋 • Classification task: 영화 장르 예측 (전체 23 class: 드라마, 코미디, 다큐멘터리, 스포츠, 서부, 느와르 등등) MM-IMDB dataset • 이 방법으로 찾은 Multimodal fusion network이 Unimodal network 과 이전 fusion 방법들에 비해 높은 성능을 보임 f=VGG-19 (8 layers) g=Maxout-MLP (2 layers) P=(ReLU, Sigmoid, LeakyReLU)
  • 14. NTU RGB+D dataset을 활용한 실험 Chao Li et al., (2018). Co-occurrencefeature learning from skeleton data for action recognition and detection with hierarchical aggregation. IJCAI 2018 NTU RGB+D dataset 3. Experimental Results 14/20 • Shahroudy et al., in 2016 CVPR • With 56,880 samples, it is the largest color and depth multimodal dataset. • Classification task: 60가지 행동 예측 (drinking, eating, falling down, hugging, shaking hands, punching 등) f=deep co-occurrence network (4 layers) g=Inflated ResNet-50 (4 layers) P=(ReLU, Sigmoid, LeakyReLU) Video & Skeleton sequence
  • 15. 3. Experimental Results 15/20 NTU RGB+D dataset을 활용한 실험 • 이 방법으로 찾은 Multimodal fusion network이 Unimodal network 과 이전 fusion 방법들에 비해 높은 성능을 보임 f=deep co-occurrence network (4 layers) g=Inflated ResNet-50 (4 layers) P=(ReLU, Sigmoid, LeakyReLU)
  • 16. 3. Experimental Results 16/20 성능과 네트워크 아키텍쳐의 상관관계 • 가장 성능이 좋은 아키텍쳐가 가장 큰 아키텍쳐는 아니다
  • 17. 3. Experimental Results 17/20 탐색 진행에 따른 네트워크 성능 • Sampled architectures are more and more stable error-wise as the search progresses • Shared fusion weigths가 점점 정교해짐, Surrogate function이 잘 동작함
  • 18. 3. Experimental Results 18/20 아키텍쳐 탐색 시간
  • 19. 2. Methods Class-Balanced Loss의 정의 10/20 • 각 class마다의 weighting factor 𝒂𝒊 를 𝒂𝒊 ∝ 𝟏/𝑬 𝒏 𝒊 로 정의함 Figure 3. Visualization of the proposed class-balanced term. Both axes are in log scale. For a long-tailed dataset where major classes have significantly more samples than minor classes, setting beta properly re-balances the relative loss across classes and reduces the drastic imbalance of re-weighing by inverse class frequency.
  • 21. 4. Conclusions 20/20 Thank you. • 이 논문은 Multimodal classification을 위한 fusion architecture를 찾는 문 제를 해결하기 위한 theoretical framework를 제시했다. • Future work은 현 방식의 Search space를 확장시켜 fusion layer의 조합을 더 유연하게 할 수 있는 방향으로 갈 것이다. • 저자들이 제안한 multimodal search space는 다양한 구조가 나올 수 있을 만큼 복잡하면서도, 문제의 복잡도가 reasonable할 정도로 제한적이었다. • 저자들은 제안된 Architecture search방법을 3개의 multimodal dataset을 통해 검증했다.