SlideShare une entreprise Scribd logo
1  sur  36
Unsupervised Visual Domain
Adaptation Using Auxiliary
Information in Target Domain
Masaya Okamoto and Hideki Nakayama
Graduate School of Information Science and
Technology,
The University of Tokyo,
Tokyo, Japan
© The University of Tokyo 1
Outline
• Background
• Related work
• Proposed method
• Experiments
• Conclusion
• Future work
© The University of Tokyo 2
Background
• A lot of hand labeled data is necessary
for image recognition
– PASCAL VOC2012: 11,530 labeled images
• It’s so tough work to label images
– Lack of hand labeled data
• Many labeled (tagged) images in web
– We can’t use web images directly
Example images of
PASCAL VOC2012
© The University of Tokyo
Domain Adaptation
3
Domain Adaptation
Learn Test
TestLearn
Learn Test
TestLearn
Learning from other domain
※From CVPR 2012 Tutorial on Domain Transfer Learning for Vision Applications
© The University of Tokyo 4
Source and Target
Source Domain Target Domain
Learn TestCup Cup Cup
CupCupCup
Cup
Cup
Cup
Many labeled samples Few labeled samples© The University of Tokyo 5
Difficulty of domain adaptation
• Simple methods don’t work in other situation
© The University of Tokyo 6
(average of 31 classes) From 「Adapting visual category models to new domains」 K. Saenko…
Related work
• Semi-supervised domain adaptation
– It assume few labeled examples in target domain
– Saenko et al. [1] [ECCV 2010]
• First work on visual domain adaptation
• Unsupervised domain adaptation
– No labeled example is used in target domain
– Preferable but quite difficult
– Gong et al. [4] [CVPR 2012]
– Fernando et al. [5][ICCV 2013]
© The University of Tokyo 7
Subspace based method
• Generate “virtual” domains that blend the properties
of source and target
• Geodesic flow sampling (GFS) by Gopalan et al.
– Generates multiple subspaces by sampling points from the
geodesic flow on the Grassmann manifold
© The University of Tokyo
8
From 「Domain Adaptation for Object Recognition: An Unsupervised Approach」 R. Gopalan …
Subspace based method
• Geodesic flow Kernel (GFK) by Gong et al.
– Analytic solution of sampling based approach
• Subspace based approach is probably the current
most successful approach
© The University of Tokyo 9
From 「Geodesic Flow Kernel for Unsupervised Domain Adaptation」 B. Gong …
© The University of Tokyo 10
• To make source domain semantic distribution,
applying PLS with labels
• [Problem] Can’t apply PLS to target because of
lack of cues like labels
Subspace based method
Target subspace
Source subspace
Cup
Monitor
Our core Idea
• Previous works on domain adaptation use
only visual information in target domain
• Use subsidiary non-visual data as semantic
cues in subspace based methods
– Such as Depth, location data (GPS), gyroscopes …
© The University of Tokyo 11
Lack of semantic information
in target subspace
Proposed Method
• Using PLS instead of PCA for generating source
subspace improved the performance [4]
• We propose the method using PLS for generating
target subspace
– Use subsidiary information as predicted variables
– Our method improve the distribution of data in target
subspace
© The University of Tokyo 12
Difference between ours and others
© The University of Tokyo
Target subspace
Source subspace
Source :A lot of labeled images Target : A lot of unlabeled
Source :A lot of labeled images Target :A lot of unlabeled
and subsidiary signal
Target subspace
Source subspace
Original GFK
or SA
Our work
Cup
Monitor
Cup
Monitor
13
© The University of Tokyo
Target subspace
Source subspace
Cup
Cup
Monitor
Monitor
Monitor
Cup
Source images with labels
Target images
with subsidiary info.
14
© The University of Tokyo
Source subspace
Cup
Cup
Monitor
Monitor
Monitor
Cup
Target subspace
1. PLS in source subspace
15
© The University of Tokyo
Source subspace
Cup
Cup
Monitor
Monitor
Monitor
Cup
Target subspace
2. PLS in target subspace
16
© The University of Tokyo
Source subspace
Cup
Cup
Monitor
Monitor
Monitor
Cup
Target subspace
3. Subspace based domain adaptation
17
Experiments Settings
• Use distance feature as subsidiary information
– Extract depth feature applying depth kernel
descriptors(Bo et al.)[10]
– Obtained 14000-dim distance features for each
image
• Change the number of source samples
– 120, 300, 1600, 1800 and 3000 samples
• Chose best subspace dim from 10, 20, 30, 40 or
50 for each case
© The University of Tokyo 18
Experiments Settings
• B3DO[8] as the target domain data
– Evaluate classification accuracy of 6 classis
© The University of Tokyo 19
RGB Image
Depth Image
(Subsidiary information)
Number of samples
• Source: ImageNet Target: B3DO [8]
Class ImageNet(Source) B3DO(Target)
Bottle 920 238
Bowl 919 142
Cup 919 258
Keyboard 1512 129
Monitor 1134 243
Sofa 982 109
SUM 6386 1119
AVG 1064.3 186.5
ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, J. Deng…
© The University of Tokyo
20
Difference in dataset
Class: Cup
Source: ImageNet Target: B3DO
© The University of Tokyo 21
Experiments settings
• Test 2 subspace based methods for proving that our
method improve performance constantly
① Geodesic Flow Kernel (GFK)[4]
② Subspace Alignment (SA)[5]
• Compare4 methods
1. Our method 1 (Source: PCA -> Target: PLS)
2. Baseline 1 (Source: PCA -> Target: PCA)
3. Our method 2 (Source: PLS -> Target: PLS)
4. Baseline 2 (Source: PLS -> Target: PCA)
© The University of Tokyo 22
Experimental result(GFK)
• Geodesic Flow Kernel(GFK) [4] as subspace based method
Num of
samples
OURS1 Baseline1 OURS2 Baseline2
120 28.33 28.95 32.35 31.64
300 29.31 29.85 32.71 31.55
600 29.04 28.60 32.53 28.87
1800 32.17 30.92 34.32 31.81
3000 33.42 31.72 34.94 33.92
© The University of Tokyo 23
Result graph of GFK
[4]
© The University of Tokyo 24
Num of
samples
OURS1 Baseline1 OURS2 Baseline2
120 34.05 29.85 34.23 30.83
300 33.15 30.21 32.17 31.90
600 33.78 33.15 33.33 32.71
1800 33.15 30.21 32.17 31.90
3000 34.85 32.44 33.69 32.89
Experimental result(SA)
• Subspace Alignment(SA) [4] as subspace based method
© The University of Tokyo 25
Result graph of SA
[5]
© The University of Tokyo 26
Accuracy and exec. time
• Classification accuracy and average execution
time when use 20 source Images each class
• Proposed methods take slightly more
calculation costs
OUR1 Baseline1 OUR2 Baseline2
GFK 28.33 28.95 32.35 31.64
Exec. Time 3.83s 2.26s 135.17s 128.03s
SA 34.05 29.85 34.23 30.83
Exec. Time 3.07s 0.98s 130.90s 120.30s
© The University of Tokyo 27
Conclusion
• Proposed methods are better than previous
ones using only visual information
• Subsidiary information can improve the
domain adaptation accuracy
– Constantly improved on two independent methods
• As far as we know, this is the first visual
domain adaptation method using non-visual
information in target domain
© The University of Tokyo 28
Future work
• Handling and testing other multimedia
information such as Gyroscope or Sound
• Extensive experiments
– Now focus only 6 classes
– Testing other classes, other subspace based
methods
© The University of Tokyo 29
Contacts
• Masaya Okamoto
• Nakayama Lab., the University of Tokyo
• e-mail: okamoto@nlab.ci.i.u-tokyo.ac.jp
謝謝!
© The University of Tokyo 30
Reference (1/2)
[1] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting
visual category models to new domains,” in Proc. of ECCV,
2010.
[2] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S.
Dhillon,“Information-theoretic metric learning,” in Proc. of
ICML,2007.
[3] R. Gopalan, R. Li, and R. Chellappa, “Domain adaptation
for object recognition: an unsupervised approach,” in Proc. of
ICCV, 2011.
[4] B. Gong, Y. Shi, and F. Sha, “Geodesic flow kernel for
unsupervised domain adaptation,” in Proc. of CVPR, 2012.
[5] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars,
“Unsupervised visual domain adaptation using subspace
alignment,” in Proc. of ICCV, 2013.
© The University of Tokyo 31
Reference (2/2)
[6] H. Wold, S. Kotz, and N. L. Johnson, “Partial least
squares,” in Encyclopedia of Statistical Sciences, 1985.
[7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei,
“Imagenet: a large-scale hierarchical image database,” in Proc.
of CVPR, 2009.
[8] A. Janoch, S. Karayev, Y. Jia, J. Barron, M. Fritz, K. Saenko,
and T. Darrell, “A category-level 3-d object dataset: putting
the kinect to work,” in Proc. of ICCV Workshop on Consumer
Depth Cameras in Computer Vision, 2011.
[9] D. Lowe, “Distinctive image features from scale-invariant
keypoints,” International Journal of Computer Vision, vol. 60,
no. 2, pp. 91–110, 2004.
[10] L. Bo, X. Ren, and D. Fox, “Depth kernel descriptors for
object recognition,” in Proc. of IROS, 2011.
© The University of Tokyo 32
Why use Depth as subsidiary info ?
① Easy to collect
• Some publicly-available datasets (like B3DO)
② Easier situation (We guess)
• Depth information may have strong correlation with
classes
③ Depth sensors will be used in wearable devices
• 「Project Tango」 -Google (Smartphone have Kinect-
like camera)
https://www.google.com/atap/projecttango/
© The University of Tokyo 33
• The system doesn’t need labeled samples from user
• Better than using only visual information
– Using subsidiary info makes result better
System Overview
RecognitionTarget
Distance features
(Depth Images)
WEB
Class:Chair
Source
© The University of Tokyo 34
Life logging
• Life logging system are spreading
• Much subsidiary information ( Sound, Gyro …)
• →Different situation from previous works
• In nearly feature, the situation is expected become
natural
© The University of Tokyo 35
Experimental process flow
• PLS to Source (Jack-knifing)
– Because dimensions of predictive signals are low
– Iteration process, High computational cost
• PLS to Target (Traditional)
– Because predictive signals have enough
dimensions (14000-dim)
• Subspace based method
– GFK or SA
© The University of Tokyo 36

Contenu connexe

Tendances

Lecture3 xing fei-fei
Lecture3 xing fei-feiLecture3 xing fei-fei
Lecture3 xing fei-feiTianlu Wang
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
 
Yann le cun
Yann le cunYann le cun
Yann le cunYandex
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye viewRoelof Pieters
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Shunta Saito
 
Qualcomm research-imagenet2015
Qualcomm research-imagenet2015Qualcomm research-imagenet2015
Qualcomm research-imagenet2015Bilkent University
 
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...Edge AI and Vision Alliance
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...cvpaper. challenge
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Universitat Politècnica de Catalunya
 
Applying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsApplying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsDarian Frajberg
 
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVMRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVMFerhat Ozgur Catak
 
Александр Заричковый "Faster than real-time face detection"
Александр Заричковый "Faster than real-time face detection"Александр Заричковый "Faster than real-time face detection"
Александр Заричковый "Faster than real-time face detection"Fwdays
 
Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...
Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...
Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...Ferhat Ozgur Catak
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introductionKosuke Nakago
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選Yusuke Uchida
 
A new multiple classifiers soft decisions fusion approach for exons predictio...
A new multiple classifiers soft decisions fusion approach for exons predictio...A new multiple classifiers soft decisions fusion approach for exons predictio...
A new multiple classifiers soft decisions fusion approach for exons predictio...Ismail M. El-Badawy
 
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...NECST Lab @ Politecnico di Milano
 

Tendances (20)

Lecture3 xing fei-fei
Lecture3 xing fei-feiLecture3 xing fei-fei
Lecture3 xing fei-fei
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
Qualcomm research-imagenet2015
Qualcomm research-imagenet2015Qualcomm research-imagenet2015
Qualcomm research-imagenet2015
 
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
Applying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsApplying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labels
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVMRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
 
Александр Заричковый "Faster than real-time face detection"
Александр Заричковый "Faster than real-time face detection"Александр Заричковый "Faster than real-time face detection"
Александр Заричковый "Faster than real-time face detection"
 
Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...
Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...
Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introduction
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選
 
A new multiple classifiers soft decisions fusion approach for exons predictio...
A new multiple classifiers soft decisions fusion approach for exons predictio...A new multiple classifiers soft decisions fusion approach for exons predictio...
A new multiple classifiers soft decisions fusion approach for exons predictio...
 
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
 

En vedette

画像処理分野における研究事例紹介
画像処理分野における研究事例紹介画像処理分野における研究事例紹介
画像処理分野における研究事例紹介nlab_utokyo
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 
20160601画像電子学会
20160601画像電子学会20160601画像電子学会
20160601画像電子学会nlab_utokyo
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーnlab_utokyo
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例nlab_utokyo
 

En vedette (6)

画像処理分野における研究事例紹介
画像処理分野における研究事例紹介画像処理分野における研究事例紹介
画像処理分野における研究事例紹介
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
20160601画像電子学会
20160601画像電子学会20160601画像電子学会
20160601画像電子学会
 
20150930
2015093020150930
20150930
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 

Similaire à ISM2014

Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsJaey Jeong
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesUnited States Air Force Academy
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...MOVING Project
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Jaey Jeong
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningAly Abdelkareem
 
PCA and Classification
PCA and ClassificationPCA and Classification
PCA and ClassificationFatwa Ramdani
 
2015-07-08 Paper 38 - ICVS Talk
2015-07-08 Paper 38 - ICVS Talk2015-07-08 Paper 38 - ICVS Talk
2015-07-08 Paper 38 - ICVS TalkThomas Sølund
 
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning Phuc Nguyen
 
A Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated ContentsA Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated ContentsDuc Nguyen
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for visionJaey Jeong
 
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...CREST @ University of Adelaide
 
Procedural modeling using autoencoder networks
Procedural modeling using autoencoder networksProcedural modeling using autoencoder networks
Procedural modeling using autoencoder networksShuhei Iitsuka
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Saimunur Rahman
 
A vision-based uncut crop edge detection method for automated guidance of hea...
A vision-based uncut crop edge detection method for automated guidance of hea...A vision-based uncut crop edge detection method for automated guidance of hea...
A vision-based uncut crop edge detection method for automated guidance of hea...Institute of Agricultural Machinery, NARO
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
 
Presentation Selan dos Santos 4Eyes Lab
Presentation Selan dos Santos 4Eyes LabPresentation Selan dos Santos 4Eyes Lab
Presentation Selan dos Santos 4Eyes Labselan_rds
 
Orientation_JOP_1Aug2022.pdf
Orientation_JOP_1Aug2022.pdfOrientation_JOP_1Aug2022.pdf
Orientation_JOP_1Aug2022.pdfkeshav11845
 

Similaire à ISM2014 (20)

Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settings
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
Visual Search for Musical Performances and Endoscopic Videos
Visual Search for Musical Performances and Endoscopic VideosVisual Search for Musical Performances and Endoscopic Videos
Visual Search for Musical Performances and Endoscopic Videos
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
PCA and Classification
PCA and ClassificationPCA and Classification
PCA and Classification
 
2015-07-08 Paper 38 - ICVS Talk
2015-07-08 Paper 38 - ICVS Talk2015-07-08 Paper 38 - ICVS Talk
2015-07-08 Paper 38 - ICVS Talk
 
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
 
A Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated ContentsA Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated Contents
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for vision
 
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
 
Procedural modeling using autoencoder networks
Procedural modeling using autoencoder networksProcedural modeling using autoencoder networks
Procedural modeling using autoencoder networks
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
 
Erfaringer med Remote Usability Testing af Jan Stage, AAU
Erfaringer med Remote Usability Testing af Jan Stage, AAUErfaringer med Remote Usability Testing af Jan Stage, AAU
Erfaringer med Remote Usability Testing af Jan Stage, AAU
 
A vision-based uncut crop edge detection method for automated guidance of hea...
A vision-based uncut crop edge detection method for automated guidance of hea...A vision-based uncut crop edge detection method for automated guidance of hea...
A vision-based uncut crop edge detection method for automated guidance of hea...
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
 
Presentation Selan dos Santos 4Eyes Lab
Presentation Selan dos Santos 4Eyes LabPresentation Selan dos Santos 4Eyes Lab
Presentation Selan dos Santos 4Eyes Lab
 
Progress Reprot.pptx
Progress Reprot.pptxProgress Reprot.pptx
Progress Reprot.pptx
 
Orientation_JOP_1Aug2022.pdf
Orientation_JOP_1Aug2022.pdfOrientation_JOP_1Aug2022.pdf
Orientation_JOP_1Aug2022.pdf
 

Plus de nlab_utokyo

画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向nlab_utokyo
 
大規模言語モデルとChatGPT
大規模言語モデルとChatGPT大規模言語モデルとChatGPT
大規模言語モデルとChatGPTnlab_utokyo
 
Non-autoregressive text generation
Non-autoregressive text generationNon-autoregressive text generation
Non-autoregressive text generationnlab_utokyo
 
2020年度 東京大学中山研 研究室紹介
2020年度 東京大学中山研 研究室紹介2020年度 東京大学中山研 研究室紹介
2020年度 東京大学中山研 研究室紹介nlab_utokyo
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~nlab_utokyo
 
Lab introduction 2014
Lab introduction 2014Lab introduction 2014
Lab introduction 2014nlab_utokyo
 
SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2nlab_utokyo
 

Plus de nlab_utokyo (11)

画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向
 
大規模言語モデルとChatGPT
大規模言語モデルとChatGPT大規模言語モデルとChatGPT
大規模言語モデルとChatGPT
 
Non-autoregressive text generation
Non-autoregressive text generationNon-autoregressive text generation
Non-autoregressive text generation
 
2020年度 東京大学中山研 研究室紹介
2020年度 東京大学中山研 研究室紹介2020年度 東京大学中山研 研究室紹介
2020年度 東京大学中山研 研究室紹介
 
RecSysTV2014
RecSysTV2014RecSysTV2014
RecSysTV2014
 
20150414seminar
20150414seminar20150414seminar
20150414seminar
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~
 
Lab introduction 2014
Lab introduction 2014Lab introduction 2014
Lab introduction 2014
 
SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2
 
ICME 2013
ICME 2013ICME 2013
ICME 2013
 
Seminar
SeminarSeminar
Seminar
 

Dernier

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 

Dernier (20)

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 

ISM2014

  • 1. Unsupervised Visual Domain Adaptation Using Auxiliary Information in Target Domain Masaya Okamoto and Hideki Nakayama Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan © The University of Tokyo 1
  • 2. Outline • Background • Related work • Proposed method • Experiments • Conclusion • Future work © The University of Tokyo 2
  • 3. Background • A lot of hand labeled data is necessary for image recognition – PASCAL VOC2012: 11,530 labeled images • It’s so tough work to label images – Lack of hand labeled data • Many labeled (tagged) images in web – We can’t use web images directly Example images of PASCAL VOC2012 © The University of Tokyo Domain Adaptation 3
  • 4. Domain Adaptation Learn Test TestLearn Learn Test TestLearn Learning from other domain ※From CVPR 2012 Tutorial on Domain Transfer Learning for Vision Applications © The University of Tokyo 4
  • 5. Source and Target Source Domain Target Domain Learn TestCup Cup Cup CupCupCup Cup Cup Cup Many labeled samples Few labeled samples© The University of Tokyo 5
  • 6. Difficulty of domain adaptation • Simple methods don’t work in other situation © The University of Tokyo 6 (average of 31 classes) From 「Adapting visual category models to new domains」 K. Saenko…
  • 7. Related work • Semi-supervised domain adaptation – It assume few labeled examples in target domain – Saenko et al. [1] [ECCV 2010] • First work on visual domain adaptation • Unsupervised domain adaptation – No labeled example is used in target domain – Preferable but quite difficult – Gong et al. [4] [CVPR 2012] – Fernando et al. [5][ICCV 2013] © The University of Tokyo 7
  • 8. Subspace based method • Generate “virtual” domains that blend the properties of source and target • Geodesic flow sampling (GFS) by Gopalan et al. – Generates multiple subspaces by sampling points from the geodesic flow on the Grassmann manifold © The University of Tokyo 8 From 「Domain Adaptation for Object Recognition: An Unsupervised Approach」 R. Gopalan …
  • 9. Subspace based method • Geodesic flow Kernel (GFK) by Gong et al. – Analytic solution of sampling based approach • Subspace based approach is probably the current most successful approach © The University of Tokyo 9 From 「Geodesic Flow Kernel for Unsupervised Domain Adaptation」 B. Gong …
  • 10. © The University of Tokyo 10 • To make source domain semantic distribution, applying PLS with labels • [Problem] Can’t apply PLS to target because of lack of cues like labels Subspace based method Target subspace Source subspace Cup Monitor
  • 11. Our core Idea • Previous works on domain adaptation use only visual information in target domain • Use subsidiary non-visual data as semantic cues in subspace based methods – Such as Depth, location data (GPS), gyroscopes … © The University of Tokyo 11 Lack of semantic information in target subspace
  • 12. Proposed Method • Using PLS instead of PCA for generating source subspace improved the performance [4] • We propose the method using PLS for generating target subspace – Use subsidiary information as predicted variables – Our method improve the distribution of data in target subspace © The University of Tokyo 12
  • 13. Difference between ours and others © The University of Tokyo Target subspace Source subspace Source :A lot of labeled images Target : A lot of unlabeled Source :A lot of labeled images Target :A lot of unlabeled and subsidiary signal Target subspace Source subspace Original GFK or SA Our work Cup Monitor Cup Monitor 13
  • 14. © The University of Tokyo Target subspace Source subspace Cup Cup Monitor Monitor Monitor Cup Source images with labels Target images with subsidiary info. 14
  • 15. © The University of Tokyo Source subspace Cup Cup Monitor Monitor Monitor Cup Target subspace 1. PLS in source subspace 15
  • 16. © The University of Tokyo Source subspace Cup Cup Monitor Monitor Monitor Cup Target subspace 2. PLS in target subspace 16
  • 17. © The University of Tokyo Source subspace Cup Cup Monitor Monitor Monitor Cup Target subspace 3. Subspace based domain adaptation 17
  • 18. Experiments Settings • Use distance feature as subsidiary information – Extract depth feature applying depth kernel descriptors(Bo et al.)[10] – Obtained 14000-dim distance features for each image • Change the number of source samples – 120, 300, 1600, 1800 and 3000 samples • Chose best subspace dim from 10, 20, 30, 40 or 50 for each case © The University of Tokyo 18
  • 19. Experiments Settings • B3DO[8] as the target domain data – Evaluate classification accuracy of 6 classis © The University of Tokyo 19 RGB Image Depth Image (Subsidiary information)
  • 20. Number of samples • Source: ImageNet Target: B3DO [8] Class ImageNet(Source) B3DO(Target) Bottle 920 238 Bowl 919 142 Cup 919 258 Keyboard 1512 129 Monitor 1134 243 Sofa 982 109 SUM 6386 1119 AVG 1064.3 186.5 ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, J. Deng… © The University of Tokyo 20
  • 21. Difference in dataset Class: Cup Source: ImageNet Target: B3DO © The University of Tokyo 21
  • 22. Experiments settings • Test 2 subspace based methods for proving that our method improve performance constantly ① Geodesic Flow Kernel (GFK)[4] ② Subspace Alignment (SA)[5] • Compare4 methods 1. Our method 1 (Source: PCA -> Target: PLS) 2. Baseline 1 (Source: PCA -> Target: PCA) 3. Our method 2 (Source: PLS -> Target: PLS) 4. Baseline 2 (Source: PLS -> Target: PCA) © The University of Tokyo 22
  • 23. Experimental result(GFK) • Geodesic Flow Kernel(GFK) [4] as subspace based method Num of samples OURS1 Baseline1 OURS2 Baseline2 120 28.33 28.95 32.35 31.64 300 29.31 29.85 32.71 31.55 600 29.04 28.60 32.53 28.87 1800 32.17 30.92 34.32 31.81 3000 33.42 31.72 34.94 33.92 © The University of Tokyo 23
  • 24. Result graph of GFK [4] © The University of Tokyo 24
  • 25. Num of samples OURS1 Baseline1 OURS2 Baseline2 120 34.05 29.85 34.23 30.83 300 33.15 30.21 32.17 31.90 600 33.78 33.15 33.33 32.71 1800 33.15 30.21 32.17 31.90 3000 34.85 32.44 33.69 32.89 Experimental result(SA) • Subspace Alignment(SA) [4] as subspace based method © The University of Tokyo 25
  • 26. Result graph of SA [5] © The University of Tokyo 26
  • 27. Accuracy and exec. time • Classification accuracy and average execution time when use 20 source Images each class • Proposed methods take slightly more calculation costs OUR1 Baseline1 OUR2 Baseline2 GFK 28.33 28.95 32.35 31.64 Exec. Time 3.83s 2.26s 135.17s 128.03s SA 34.05 29.85 34.23 30.83 Exec. Time 3.07s 0.98s 130.90s 120.30s © The University of Tokyo 27
  • 28. Conclusion • Proposed methods are better than previous ones using only visual information • Subsidiary information can improve the domain adaptation accuracy – Constantly improved on two independent methods • As far as we know, this is the first visual domain adaptation method using non-visual information in target domain © The University of Tokyo 28
  • 29. Future work • Handling and testing other multimedia information such as Gyroscope or Sound • Extensive experiments – Now focus only 6 classes – Testing other classes, other subspace based methods © The University of Tokyo 29
  • 30. Contacts • Masaya Okamoto • Nakayama Lab., the University of Tokyo • e-mail: okamoto@nlab.ci.i.u-tokyo.ac.jp 謝謝! © The University of Tokyo 30
  • 31. Reference (1/2) [1] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in Proc. of ECCV, 2010. [2] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon,“Information-theoretic metric learning,” in Proc. of ICML,2007. [3] R. Gopalan, R. Li, and R. Chellappa, “Domain adaptation for object recognition: an unsupervised approach,” in Proc. of ICCV, 2011. [4] B. Gong, Y. Shi, and F. Sha, “Geodesic flow kernel for unsupervised domain adaptation,” in Proc. of CVPR, 2012. [5] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, “Unsupervised visual domain adaptation using subspace alignment,” in Proc. of ICCV, 2013. © The University of Tokyo 31
  • 32. Reference (2/2) [6] H. Wold, S. Kotz, and N. L. Johnson, “Partial least squares,” in Encyclopedia of Statistical Sciences, 1985. [7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei, “Imagenet: a large-scale hierarchical image database,” in Proc. of CVPR, 2009. [8] A. Janoch, S. Karayev, Y. Jia, J. Barron, M. Fritz, K. Saenko, and T. Darrell, “A category-level 3-d object dataset: putting the kinect to work,” in Proc. of ICCV Workshop on Consumer Depth Cameras in Computer Vision, 2011. [9] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [10] L. Bo, X. Ren, and D. Fox, “Depth kernel descriptors for object recognition,” in Proc. of IROS, 2011. © The University of Tokyo 32
  • 33. Why use Depth as subsidiary info ? ① Easy to collect • Some publicly-available datasets (like B3DO) ② Easier situation (We guess) • Depth information may have strong correlation with classes ③ Depth sensors will be used in wearable devices • 「Project Tango」 -Google (Smartphone have Kinect- like camera) https://www.google.com/atap/projecttango/ © The University of Tokyo 33
  • 34. • The system doesn’t need labeled samples from user • Better than using only visual information – Using subsidiary info makes result better System Overview RecognitionTarget Distance features (Depth Images) WEB Class:Chair Source © The University of Tokyo 34
  • 35. Life logging • Life logging system are spreading • Much subsidiary information ( Sound, Gyro …) • →Different situation from previous works • In nearly feature, the situation is expected become natural © The University of Tokyo 35
  • 36. Experimental process flow • PLS to Source (Jack-knifing) – Because dimensions of predictive signals are low – Iteration process, High computational cost • PLS to Target (Traditional) – Because predictive signals have enough dimensions (14000-dim) • Subspace based method – GFK or SA © The University of Tokyo 36

Notes de l'éditeur

  1. Hello, everyone. My name is Masaya Okamoto. I’m from the University of Tokyo, Japan. ,I’m glad to be here. , I’ll talk about “Unsupervised Visual Domain Adaptation Using Auxiliary Information in Target Domain”. (オグジルリ)(サブシディアリ)
  2. This is outline of my talk. The first, I’ll speak the background of our research. The second, I’ll mention about previous visual domain adaptation works and difference between it and ours. Next, I will explain the core idea and details of proposed method. And then, I will talk about experiments and its results. Finally, I’ll speak about conclusion and feature work.
  3. recently, image recognition systems need many hand labeled images for training. For example, PASCAL VOC 2012 used over 10 thousands labeled images. (ツーサウザンドトゥエルブ) We suffer lack of hand labeled images because labeling by hand is tough work. ,,On the other hand, There are many labeled images in web. But we cant use web these images directly. Therefor domain adaptation techniques has gathered more and more attention.
  4. (Click) This figure shows overview of domain adaptation. (Click) Domain adaptation is learning one domain images and testing other domain images. As you see, It is learning from the images that have different characteristic.
  5. The domain where a classifier is trained is called the “source domain” and is expected to provide a lot of labeled data. The domain in which the classifier is actually tested is called the “target domain” and is assumed to have different characteristics such as illumination and resolution, from the source domain. This Figure shows an example of the difference between two domains.
  6. I’ll explain the difficulty of domain adaptation. , This is the result from a previous work. This table shows the classification scores of averages of 31 classes. ,, If classifier was trained and tests in same domain, as upper side of the table, the classifier achieve the good score.,, But, If classifier was trained in one domain and tests other domain, as lower side. The classifiers like support vector machine or Naive-Bayes Nearest Neighbor don’t work well.
  7. there are many visual domain adaptation so far,, Saenko et al. proposed the first work on domain adaptation for image recognition in 2012. , It was semi-supervised domain adaptation that assume few labeled examples in target domain. (Click) After that, Gong et al., Fernando et al and more proposed several works as unsupervised visual domain adaptation. These don’t need labeled sample in target domain. Considering that our objective is to reduce the cost of manual labeling, an unsupervised setting is the ultimate goal of domain adaptation, but it is very difficult task. (Click) We focus unsupervised domain adaptation setting.
  8. In follow slides,, I will explain the previous works of subspace based domain adaptation method., Current, the subspace based approach like these has been known to be a promising strategy for unsupervised domain adaptation. ,,Subspace based methods generate “virtual” domains that blend the properties of source and target. The first work of subspace based method was proposed by Gopalan et al. as Geodesic flow sampling. , For short GFS. , First of all, ,GFS generates subspaces for source and target domains respectively. ,, Next, It generates multiple intermediate subspaces between source and target ones by sampling points from the geodesic flow on the Grassmann manifold. ,One problem of GFS is the trade-off between performance , and the dimensions of feature vectors that depend on a number of sampled intermediate subspaces. ,, In other words, to improve the performance, we need to take more intermediate subspaces, but this results in higher computational costs. ,, Some methods relax this problem.
  9. One of these methods is Geodesic flow kernel. For short GFK. It was proposed by gong et al. GFK is analytic solution of sampling based approach. Current, the subspace based approach like these has been known to be a promising strategy for unsupervised domain adaptation.
  10. The first step of subspace based methods is generating source and target subspaces. ,, In considering following processes and “virtual” intermediate domain, each subspace have to be semantic distribution. ,, In previous works, , To make source subspace semantic distribution, applying partial least squares analysis with labels. But, we cant generate semantic distribution in target because target domain doesn’t have semantic cues like labels.
  11. In this slide shows the core idea of our method. Previous works on visual domain adaptation use only visual information in target domain.(Click) So, we suffered lack of semantic information in target subspace. In our opinion, we have to exploit subsidiary data for more improvement.(Click) Thus, we propose the method using non-visual data such as distance or location or gyroscopes information As semantic cues.
  12. Actually, From previous work, we knew the knowledge that applying partial least squares instead of principal components analysis for generating source subspace improved domain adaptation performance. From now on, PLS means partial least squares, PCA means principal components analysis. Based the knowledge, proposed method apply PCA instead of PCA to target subspace. our method improve the distribution of data in target subspace using subsidiary information as cues.
  13. The figures shows difference between ours and other unsupervised domain adaptation. Source domain have large number of labeled images. Our work assume no labeling on target domain like other works. But subsidiary signals are provided. We emphasize that subsidiary signal are provided in only target subspace. Thus, our method don’t do simple expanding features for performance. サブスペースベースのレビュー サンプルするやつ ↓ GFK(解析会) 最新の手法を解説 ターゲットはセマンティックになってないよね。
  14. Let me talk about process flow of proposed method. This picture is illustration of our method. Left side of this figure express source subspace. All source images have class labels. Right side express target subspace. All target images have not labels but subsidiary information.
  15. At first, Applying partial least squares analysis to source domain using class labels as predictive values.
  16. At second, Applying partial least squares analysis to target domain using subsidiary information as predictive values. Thus, we also make target domain several semantic distribution. Subsidiary information is used for only this process.
  17. Finally, Apply subspace based domain adaptation. We improve previous method by creating semantic distribution both source and target domains.
  18. let me mention about experiments (フィーチャーズ) We used distance features as subsidiary information. The features extracted by depth kernel descriptors proposed by bo et al. Actually, we obtained a 14000 dimensional feature from each depth image. ,We changed the numbers of source samples from 20 to 500 per class In total, 120 to 3000 samples. ,We experimentally chose dimensions of subspaces among 10, 20, 30, 40, and 50 that maximize the classification accuracy for each case because fixed dimensions may bias a particular method to work better.
  19. We used B3DO dataset from “A category-level 3-d object dataset: putting the kinect to work”. B3DO is publically available rgb-d dataset proposed by janoch et al. This figure shows the examples of B3DO dataset. The rgb-image and depth image pairs are provided.
  20. This table shows the number of source and target images. Source images obtained from ImageNet and target from B3DO dataset. All images were cropped.
  21. This figure shows the actually difference of experiment dataset. This is cup class. As you see, there are a lot of difference such as lighting or resolution, background.
  22. As based method, To prove that proposed method improve performance constantly, We exploit 2 independent state of the art subspace based domain adaptation methods. First one is Geodesic flow kernel Second is subspace alignment. ,To evaluate performance of our method, we compared 4 kind of methods. The first one is proposed method1 applying PCA to source and PLS to target. The second is baseline1 PCA to both source and target. The third one is proposed method2 applying PLS to source and target. The fourth is baseline2 PLS to source and PCA to target. ,(Click),The comparison of our method1 and Baseline1 illustrates the effectiveness of our approach when PCA was used for building the source subspace. ,(Click),Similarly, our method2 and Baseline2 are comparable when PLS was used in the source domain. We expected to observe the respective improvements in each case.
  23. This table shows the results when use gfk method as base. OURS2 was the best in every case. グラフだけでよいかも(時間がオーバーする場合は削除する) 中心でわけて比較する
  24. This figure shows the result of experiments on Geodesic flow kernel method. Red and Blue lines are proposed methods. In this case, blue line, our method 2 that applying PLS to both source and target subspaces was the best.
  25. This table shows the results when use subspace alignment method as base. Our method1 was the best in every case.
  26. This figure shows the result of experiments on Subspace alignment method. In this case, blue line, our method 1 that applying PLS to target and PCA to source subspace was the best.
  27. In this slide, we mention about execution time of each methods. ,, Exec time in table show the average execution time. Proposed method take more calculation time than baselines. About 2 seconds in cases that applied PCA to source. , About 10 seconds in cases applied PLS. But we think it is acceptable because extra calculation time was negligible especially case applied PLS to source domain.(ネグリジブル)
  28. Let me talk about conclusion, Proposed methods using non-visual info additionally on target space are better than previous ones. We emphasize again that subsidiary signal are provided in only target domain, And our method don’t do simple expanding features for performance. We showed that Subsidiary information can improve the domain adaptation accuracy. The result of experiments shows that our method is effective and valid Because our method improved the performance on two independent state of the art subspace based methods constantly. Next, We proposed new domain adaptation task that assuming target domain have some subsidiary non-visual information. And this is the first method using non-visual information.
  29. For the future work, The first one is handling and testing other multimodal information Such as gyroscope or sound obtained when a picture was taken. The second one is expanding experiments. We have to test more classes and subspace based methods. I think you are right about (topic/information), ~ It is a problem of our methods, it is feature work.
  30. Thank you very much. I’m sorry, I don’t have the information about it now. But I guess Is your question about ~ (section/figure). Actually, I cant answer your question, but I guess that ~ This is difficult to explain, but I’d be pleased to talk it later. Sorry, but that is outside the area of this study. Does that answer your question?
  31. There are three reasons. At first, It is easy to collect. There are some publicly available dataset like B3DO. At second, we think distance information make a problem easier Because distance features may have stronger correlation with classes than location or sounds. At third, Depth sensors will be used in wearable devices. Google anounced project tango that make the smartphone havebuilt in kinect like camera. That’s why we choose distance infromation as subsidiary information.
  32. At first step is applying jack-knifing PLS to source domain. Labels as predictive signal in source domain don’t have enough dimensions. It is iteration process and high computational cost. At second is applying normal PLS to target space by solving a eigenvalue problem. It is low computational cost. Distance features as predictive signals in target domain have enough dimensions. At third is applying subspace based methods, experimentally GFK or SA.
  33. Our objective is Summarizing egocentric moving videos for generating walking route guidance video. A raw video is too long to watch because it is as long as walking in route. It’s difficult to use route guide in off course. To use route guidance, our system summaries it automatically.
  34. In this slide shows that overview of our method Our system consists of 3 steps First step is generating source and target subspace for dimensions reduction Second is Third step is From the next slide, I’ll explain the detail of each step
  35. In this study, we focus unsupervised domain adaptation setting. Previous works used only visual information for domain adaptation in target domain. In our option, this is cause of domain adaptation difficulty. So, we propose new domain adaptation task that with subsidiary information and propose its the first method.