Faces in Places: Compound Query Retrieval

•

2 j'aime•426 vues

Slides from Eva Mohedano and Andrea Calafell for the the UPC Computer Vision Reading Group about the paper: Zhong, Yujie, Relja Arandjelović, and Andrew Zisserman. "Faces In Places: compound query retrieval." In BMVC 2016. The goal of this work is to retrieve images containing both a target person and a target scene type from a large dataset of images. At run time this compound query is handled using a face classifier trained for the person, and an image classifier trained for the scene type. We make three contributions: first, we propose a hybrid convolutional neural network architecture that produces place-descriptors that are aware of faces and their corresponding descriptors. The network is trained to correctly classify a combination of face and scene classifier scores. Second, we propose an image synthesis system to render high quality fully-labelled face-and-place images, and train the network only from these synthetic images. Last, but not least, we collect and annotate a dataset of real images containing celebrities in different places, and use this dataset to evaluate the retrieval system. We demonstrate significantly improved retrieval performance for compound queries using the new face-aware place-descriptors.

Données & analyses

Faces in Places: compound
query retrieval
Y. Zhong, R. Arandjelovic and A. Zisserman: Paper Link
BMVC 2016
1
Slides by Eva Mohedano and Andrea Calafell [GDoc]
UPC Computer Vision Reading Group (14/10/2016)

Outline
2
1. Introduction
2. Hybrid Network
3. The “Celebrity in Places” Dataset
4. Synthetic Training Images
5. Experiments and Results
6. Summary

Introduction
Large Image Dataset
System
3
Compound query

Introduction
Three contributions:
1. Hybrid CNN to produce place descriptors that are aware of faces and their
descriptors.
2. Collect and annotate a dataset of real images containing celebrities in different
places.
3. Image synthesis system to render high quality fully-labelled face-and-place
images to train the network.
4

Outline
5
1. Introduction
2. Hybrid Network
3. The “Celebrity in Places” Dataset
4. Synthetic Training Images
5. Experiments and Results
6. Summary

Hybrid Network
8
AlexNet pre-trained on
Places205
VGG-16 trained on VGG
Face Dataset
FC7
FC7

Outline
9
1. Introduction
2. Hybrid Network
3. The “Celebrity in Places” Dataset
4. Synthetic Training Images
5. Experiments and Results
6. Summary

The “Celebrity in Places” Dataset
10
Example images from the CIP dataset
Includes:
● 4611 celebrities
● 16 places
Query texts in
Google Image
Search
2,5M
images Duplicate
removal
170K
images Mechanical
Turk
annotation
38K
images

The “Celebrity in Places” Dataset
11
Includes:
● 4611 celebrities
● 16 places
Query text in
Google Image
Search
2,5M
images Duplicate
removal
170k
images Mechanical
Turk
annotation
38k
images
Problems with this approach
● Difficult to obtain high quality images with
Image Search engines
● Obtained images highly unbalanced across
classes

Outline
12
1. Introduction
2. Hybrid Network
3. The “Celebrity in Places” Dataset
4. Synthetic Training Images
5. Experiments and Results
6. Summary

Synthetic Training Images
14
178k training images
8.7k validation images
Includes:
● 500 faces
● 16 places

Outline
15
1. Introduction
2. Hybrid Network
3. The “Celebrity in Places” Dataset
4. Synthetic Training Images
5. Experiments and Results
6. Summary

Experiments and Results
16
Comparison with 3 baselines of late fusion
● FC7 VGG faces + FC7 Places205 + L2norm
● FC7 VGG faces + FC7 Places205 finetuned on 16 places+ L2norm
● FC7 VGG faces + FC7 Places205 finetuned on 16 places+ Platt
Test sets statistics

Outline
18
1. Introduction
2. Hybrid Network
3. The “Celebrity in Places” Dataset
4. Synthetic Training Images
5. Experiments and Results
6. Summary

Summary
19
● They have presented a hybrid network for compound queries, where place
descriptors are aware of faces and face descriptors. This network outperforms
the baselines.
● They have designed an automatic pipeline to synthesize training images.
● They have collected a new dataset of real images to evaluate their methods.

Contenu connexe

En vedette

Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya

Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Universitat Politècnica de Catalunya

Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya

Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...Universitat Politècnica de Catalunya

Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Francisco Zamora-Martinez

Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...Universitat Politècnica de Catalunya

Image segmentation hj_choHyungjoo Cho

Learning Financial Market Data with Recurrent Autoencoders and TensorFlowAltoros

High level-api in tensorflowHyungjoo Cho

2017 tensor flow dev summitTae Young Lee

Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Universitat Politècnica de Catalunya

Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasTaegyun Jeon

Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon

중국 IT의 현재: 디자이너 시선으로보는 알리바바와 텐센트Hyunjoo Kate Lee

Visual Information Retrieval: Advances, Challenges and OpportunitiesOge Marques

Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...Universitat Politècnica de Catalunya

Deep learning and feature extraction for time series forecastingPavel Filonov

En vedette (18)

Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...

Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...

Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)

Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...

Time-series forecasting of indoor temperature using pre-trained Deep Neural N...

Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...

Image segmentation hj_cho

Learning Financial Market Data with Recurrent Autoencoders and TensorFlow

High level-api in tensorflow

2017 tensor flow dev summit

Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...

Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras

Electricity price forecasting with Recurrent Neural Networks

중국 IT의 현재: 디자이너 시선으로보는 알리바바와 텐센트

Visual Information Retrieval: Advances, Challenges and Opportunities

Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...

Deep learning and feature extraction for time series forecasting

Similaire à Faces in Places: Compound Query Retrieval

Report face recognition : ArganRecognIlyas CHAOUA

Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018Universitat Politècnica de Catalunya

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...YutaSuzuki27

Using Set Cover to Optimize a Large-Scale Low Latency Distributed GraphRui Wang

IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...IRJET Journal

Learning with Relative AttributesVikas Jain

ObjectDetection.pptxRitikPabbaraju2

Realtime face matching and gender prediction based on deep learningIJECEIAES

Graduation project Book (Self-Driving Car)ahmedshehata133

IRJET- Prediction of Facial Attribute without Landmark InformationIRJET Journal

Principles of Data VisualizationEamonn Maguire

MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval

VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...grssieee

Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab

Deep Learning for Computer Vision: Face Recognition (UPC 2016)Universitat Politècnica de Catalunya

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito

EE-2018-1303261-1.pdfUmarDrazKhan2

FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEWIRJET Journal

Comparison of Rendering Processes on 3D ModelAIRCC Publishing Corporation

Extraction of Buildings from Satellite ImagesAkanksha Prasad

Similaire à Faces in Places: Compound Query Retrieval (20)

Report face recognition : ArganRecogn

Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph

IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...

Learning with Relative Attributes

ObjectDetection.pptx

Realtime face matching and gender prediction based on deep learning

Graduation project Book (Self-Driving Car)

IRJET- Prediction of Facial Attribute without Landmark Information

Principles of Data Visualization

MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...

VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...

Semantic segmentation with Convolutional Neural Network Approaches

Deep Learning for Computer Vision: Face Recognition (UPC 2016)

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)

EE-2018-1303261-1.pdf

FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW

Comparison of Rendering Processes on 3D Model

Extraction of Buildings from Satellite Images

Plus de Universitat Politècnica de Catalunya

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Deep Generative Learning for AllUniversitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya

The Transformer - Xavier Giró - UPC Barcelona 2021Universitat Politècnica de Catalunya

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya

Open challenges in sign language translation and productionUniversitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya

Discovery and Learning of Navigation Goals from Pixels in MinecraftUniversitat Politècnica de Catalunya

Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya

Intepretability / Explainable AI for Deep Neural NetworksUniversitat Politècnica de Catalunya

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya

Curriculum Learning for Recurrent Video Object SegmentationUniversitat Politècnica de Catalunya

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Universitat Politècnica de Catalunya

Plus de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Deep Generative Learning for All

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

The Transformer - Xavier Giró - UPC Barcelona 2021

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Open challenges in sign language translation and production

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Learn2Sign : Sign language recognition and translation using human keypoint e...

Intepretability / Explainable AI for Deep Neural Networks

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Curriculum Learning for Recurrent Video Object Segmentation

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Dernier

RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

Easter Eggs From Star Wars and in cars 1 and 217djon017

modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx

Multiple time frame trading analysis -brianshannon.pdfchwongval

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

detection and classification of knee osteoarthritis.pptxAleenaJamil4

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2

LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

Dernier (20)

RABBIT: A CLI tool for identifying bots based on their GitHub events.

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...

GA4 Without Cookies [Measure Camp AMS]

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh

Semantic Shed - Squashing and Squeezing.pptx

Easter Eggs From Star Wars and in cars 1 and 2

modul pembelajaran robotic Workshop _ by Slidesgo.pptx

Multiple time frame trading analysis -brianshannon.pdf

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理

20240419 - Measurecamp Amsterdam - SAM.pdf

detection and classification of knee osteoarthritis.pptx

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

Identifying Appropriate Test Statistics Involving Population Mean

LLMs, LMMs, their Improvement Suggestions and the Path towards AGI

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

Real-Time AI Streaming - AI Max Princeton

Faces in Places: Compound Query Retrieval

1. Faces in Places: compound query retrieval Y. Zhong, R. Arandjelovic and A. Zisserman: Paper Link BMVC 2016 1 Slides by Eva Mohedano and Andrea Calafell [GDoc] UPC Computer Vision Reading Group (14/10/2016)

2. Outline 2 1. Introduction 2. Hybrid Network 3. The “Celebrity in Places” Dataset 4. Synthetic Training Images 5. Experiments and Results 6. Summary

3. Introduction Large Image Dataset System 3 Compound query

4. Introduction Three contributions: 1. Hybrid CNN to produce place descriptors that are aware of faces and their descriptors. 2. Collect and annotate a dataset of real images containing celebrities in different places. 3. Image synthesis system to render high quality fully-labelled face-and-place images to train the network. 4

5. Outline 5 1. Introduction 2. Hybrid Network 3. The “Celebrity in Places” Dataset 4. Synthetic Training Images 5. Experiments and Results 6. Summary

6. Basic Approach 6

7. Hybrid Network 7

8. Hybrid Network 8 AlexNet pre-trained on Places205 VGG-16 trained on VGG Face Dataset FC7 FC7

9. Outline 9 1. Introduction 2. Hybrid Network 3. The “Celebrity in Places” Dataset 4. Synthetic Training Images 5. Experiments and Results 6. Summary

10. The “Celebrity in Places” Dataset 10 Example images from the CIP dataset Includes: ● 4611 celebrities ● 16 places Query texts in Google Image Search 2,5M images Duplicate removal 170K images Mechanical Turk annotation 38K images

11. The “Celebrity in Places” Dataset 11 Includes: ● 4611 celebrities ● 16 places Query text in Google Image Search 2,5M images Duplicate removal 170k images Mechanical Turk annotation 38k images Problems with this approach ● Difficult to obtain high quality images with Image Search engines ● Obtained images highly unbalanced across classes

12. Outline 12 1. Introduction 2. Hybrid Network 3. The “Celebrity in Places” Dataset 4. Synthetic Training Images 5. Experiments and Results 6. Summary

13. Synthetic Training Images 13

14. Synthetic Training Images 14 178k training images 8.7k validation images Includes: ● 500 faces ● 16 places

15. Outline 15 1. Introduction 2. Hybrid Network 3. The “Celebrity in Places” Dataset 4. Synthetic Training Images 5. Experiments and Results 6. Summary

16. Experiments and Results 16 Comparison with 3 baselines of late fusion ● FC7 VGG faces + FC7 Places205 + L2norm ● FC7 VGG faces + FC7 Places205 finetuned on 16 places+ L2norm ● FC7 VGG faces + FC7 Places205 finetuned on 16 places+ Platt Test sets statistics

17. Experiments and Results 17

18. Outline 18 1. Introduction 2. Hybrid Network 3. The “Celebrity in Places” Dataset 4. Synthetic Training Images 5. Experiments and Results 6. Summary

19. Summary 19 ● They have presented a hybrid network for compound queries, where place descriptors are aware of faces and face descriptors. This network outperforms the baselines. ● They have designed an automatic pipeline to synthesize training images. ● They have collected a new dataset of real images to evaluate their methods.

20. Questions? 20

Faces in Places: Compound Query Retrieval

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (18)

Similaire à Faces in Places: Compound Query Retrieval

Similaire à Faces in Places: Compound Query Retrieval (20)

Plus de Universitat Politècnica de Catalunya

Plus de Universitat Politècnica de Catalunya (20)

Dernier

Dernier (20)

Faces in Places: Compound Query Retrieval