ISM2014

Unsupervised Visual Domain
Adaptation Using Auxiliary
Information in Target Domain
Masaya Okamoto and Hideki Nakayama
Graduate School of Information Science and
Technology,
The University of Tokyo,
Tokyo, Japan
© The University of Tokyo 1

Outline
• Background
• Related work
• Proposed method
• Experiments
• Conclusion
• Future work

Background
• A lot of hand labeled data is necessary
for image recognition
– PASCAL VOC2012: 11,530 labeled images
• It’s so tough work to label images
– Lack of hand labeled data
• Many labeled (tagged) images in web
– We can’t use web images directly
Example images of
PASCAL VOC2012
© The University of Tokyo
Domain Adaptation
3

Domain Adaptation
Learn Test
TestLearn
Learn Test
TestLearn
Learning from other domain
※From CVPR 2012 Tutorial on Domain Transfer Learning for Vision Applications

Source and Target
Source Domain Target Domain
Learn TestCup Cup Cup
CupCupCup
Cup
Cup
Cup
Many labeled samples Few labeled samples© The University of Tokyo 5

Difficulty of domain adaptation
• Simple methods don’t work in other situation
（average of 31 classes） From 「Adapting visual category models to new domains」 K. Saenko…

Related work
• Semi-supervised domain adaptation
– It assume few labeled examples in target domain
– Saenko et al. [1] [ECCV 2010]
• First work on visual domain adaptation
• Unsupervised domain adaptation
– No labeled example is used in target domain
– Preferable but quite difficult
– Gong et al. [4] [CVPR 2012]
– Fernando et al. [5][ICCV 2013]

Subspace based method
• Generate “virtual” domains that blend the properties
of source and target
• Geodesic flow sampling (GFS) by Gopalan et al.
– Generates multiple subspaces by sampling points from the
geodesic flow on the Grassmann manifold
8
From 「Domain Adaptation for Object Recognition: An Unsupervised Approach」 R. Gopalan …

• Geodesic flow Kernel (GFK) by Gong et al.
– Analytic solution of sampling based approach
• Subspace based approach is probably the current
most successful approach
From 「Geodesic Flow Kernel for Unsupervised Domain Adaptation」 B. Gong …

• To make source domain semantic distribution,
applying PLS with labels
• [Problem] Can’t apply PLS to target because of
lack of cues like labels
Target subspace
Source subspace
Cup
Monitor

Our core Idea
• Previous works on domain adaptation use
only visual information in target domain
• Use subsidiary non-visual data as semantic
cues in subspace based methods
– Such as Depth, location data (GPS), gyroscopes …
Lack of semantic information
in target subspace

Proposed Method
• Using PLS instead of PCA for generating source
subspace improved the performance [4]
• We propose the method using PLS for generating
target subspace
– Use subsidiary information as predicted variables
– Our method improve the distribution of data in target
subspace

Difference between ours and others
Target subspace
Source subspace
Source :A lot of labeled images Target : A lot of unlabeled
Source :A lot of labeled images Target :A lot of unlabeled
and subsidiary signal
Target subspace
Source subspace
Original GFK
or SA
Our work
Cup
Monitor
Cup
Monitor
13

Target subspace
Source subspace
Cup
Cup
Monitor
Monitor
Monitor
Cup
Source images with labels
Target images
with subsidiary info.
14

Source subspace
Cup
Cup
Monitor
Monitor
Monitor
Cup
Target subspace
1. PLS in source subspace
15

Source subspace
Cup
Cup
Monitor
Monitor
Monitor
Cup
Target subspace
2. PLS in target subspace
16

Source subspace
Cup
Cup
Monitor
Monitor
Monitor
Cup
Target subspace
3. Subspace based domain adaptation
17

Experiments Settings
• Use distance feature as subsidiary information
– Extract depth feature applying depth kernel
descriptors(Bo et al.)[10]
– Obtained 14000-dim distance features for each
image
• Change the number of source samples
– 120, 300, 1600, 1800 and 3000 samples
• Chose best subspace dim from 10, 20, 30, 40 or
50 for each case

Experiments Settings
• B3DO[8] as the target domain data
– Evaluate classification accuracy of 6 classis
RGB Image
Depth Image
(Subsidiary information)

Number of samples
• Source: ImageNet Target: B3DO [8]
Class ImageNet(Source) B3DO(Target)
Bottle 920 238
Bowl 919 142
Cup 919 258
Keyboard 1512 129
Monitor 1134 243
Sofa 982 109
SUM 6386 1119
AVG 1064.3 186.5
ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, J. Deng…
20

Difference in dataset
Class: Cup
Source: ImageNet Target: B3DO

Experiments settings
• Test 2 subspace based methods for proving that our
method improve performance constantly
① Geodesic Flow Kernel (GFK)[4]
② Subspace Alignment (SA)[5]
• Compare4 methods
1. Our method 1 (Source: PCA -> Target: PLS)
2. Baseline 1 (Source: PCA -> Target: PCA)
3. Our method 2 (Source: PLS -> Target: PLS)
4. Baseline 2 (Source: PLS -> Target: PCA)

Experimental result(GFK)
• Geodesic Flow Kernel(GFK) [4] as subspace based method
Num of
samples
OURS1 Baseline1 OURS2 Baseline2
120 28.33 28.95 32.35 31.64
300 29.31 29.85 32.71 31.55
600 29.04 28.60 32.53 28.87
1800 32.17 30.92 34.32 31.81
3000 33.42 31.72 34.94 33.92

Result graph of GFK
[4]

Num of
samples
OURS1 Baseline1 OURS2 Baseline2
120 34.05 29.85 34.23 30.83
300 33.15 30.21 32.17 31.90
600 33.78 33.15 33.33 32.71
1800 33.15 30.21 32.17 31.90
3000 34.85 32.44 33.69 32.89
Experimental result(SA)
• Subspace Alignment(SA) [4] as subspace based method

Result graph of SA
[5]

Accuracy and exec. time
• Classification accuracy and average execution
time when use 20 source Images each class
• Proposed methods take slightly more
calculation costs
OUR1 Baseline1 OUR2 Baseline2
GFK 28.33 28.95 32.35 31.64
Exec. Time 3.83s 2.26s 135.17s 128.03s
SA 34.05 29.85 34.23 30.83
Exec. Time 3.07s 0.98s 130.90s 120.30s

Conclusion
• Proposed methods are better than previous
ones using only visual information
• Subsidiary information can improve the
domain adaptation accuracy
– Constantly improved on two independent methods
• As far as we know, this is the first visual
domain adaptation method using non-visual
information in target domain

Future work
• Handling and testing other multimedia
information such as Gyroscope or Sound
• Extensive experiments
– Now focus only 6 classes
– Testing other classes, other subspace based
methods

Contacts
• Masaya Okamoto
• Nakayama Lab., the University of Tokyo
• e-mail: okamoto@nlab.ci.i.u-tokyo.ac.jp
謝謝!

Reference (1/2)
[1] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting
visual category models to new domains,” in Proc. of ECCV,
2010.
[2] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S.
Dhillon,“Information-theoretic metric learning,” in Proc. of
ICML,2007.
[3] R. Gopalan, R. Li, and R. Chellappa, “Domain adaptation
for object recognition: an unsupervised approach,” in Proc. of
ICCV, 2011.
[4] B. Gong, Y. Shi, and F. Sha, “Geodesic flow kernel for
unsupervised domain adaptation,” in Proc. of CVPR, 2012.
[5] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars,
“Unsupervised visual domain adaptation using subspace
alignment,” in Proc. of ICCV, 2013.

Reference (2/2)
[6] H. Wold, S. Kotz, and N. L. Johnson, “Partial least
squares,” in Encyclopedia of Statistical Sciences, 1985.
[7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei,
“Imagenet: a large-scale hierarchical image database,” in Proc.
of CVPR, 2009.
[8] A. Janoch, S. Karayev, Y. Jia, J. Barron, M. Fritz, K. Saenko,
and T. Darrell, “A category-level 3-d object dataset: putting
the kinect to work,” in Proc. of ICCV Workshop on Consumer
Depth Cameras in Computer Vision, 2011.
[9] D. Lowe, “Distinctive image features from scale-invariant
keypoints,” International Journal of Computer Vision, vol. 60,
no. 2, pp. 91–110, 2004.
[10] L. Bo, X. Ren, and D. Fox, “Depth kernel descriptors for
object recognition,” in Proc. of IROS, 2011.

Why use Depth as subsidiary info ?
① Easy to collect
• Some publicly-available datasets (like B3DO)
② Easier situation (We guess)
• Depth information may have strong correlation with
classes
③ Depth sensors will be used in wearable devices
• 「Project Tango」 -Google (Smartphone have Kinect-
like camera)
https://www.google.com/atap/projecttango/

• The system doesn’t need labeled samples from user
• Better than using only visual information
– Using subsidiary info makes result better
System Overview
RecognitionTarget
Distance features
(Depth Images)
WEB
Class：Chair
Source

Life logging
• Life logging system are spreading
• Much subsidiary information ( Sound, Gyro …)
• →Different situation from previous works
• In nearly feature, the situation is expected become
natural

Experimental process flow
• PLS to Source (Jack-knifing)
– Because dimensions of predictive signals are low
– Iteration process, High computational cost
• PLS to Target (Traditional)
– Because predictive signals have enough
dimensions (14000-dim)
• Subspace based method
– GFK or SA

ISM2014

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à ISM2014

Similaire à ISM2014 (20)

Plus de nlab_utokyo

Plus de nlab_utokyo (11)

Dernier

Dernier (20)

ISM2014

Notes de l'éditeur