QUICKAR-ASE2016-Singapore

QUICKAR: AUTOMATIC QUERY
REFORMULATION FOR CONCEPT
LOCATION USING CROWDSOURCED
KNOWLEDGE
Mohammad Masudur Rahman, Chanchal K. Roy
Department of Computer Science
University of Saskatchewan, Canada
31st IEEE/ACM International Conference on
Automated Software Engineering (ASE 2016), Singapore

CONCEPT LOCATION: MAPPING CONCEPTS TO
SOURCE CODE
2
Software
change request
Software
source code

CONCEPT LOCATION: REFORMULATION OF
CODE SEARCH QUERIES
3
Software user
Change request
Issue database
Software
developer
Initial query
Code search
engine
Code repositorySearch results
Stop
Yes
No
Query reformulation
12.2% of the queries by
developers were useful, Kevic
and Fritz, ICSE 2014
10%-15% of
times vocabulary
matched, Furnas et
al, Commun. ACM,
1987
Our research problem

QUERY REFORMULATION LITERATURE
 Relevance Feedback
 Gay et al, ICSM 2009
 Haiduc et al, ICSE 2013
 Query Quality Analysis
 Haiduc et al, ASE 2012
 Haiduc et al, ICSE 2012
 Haiduc and Marcus, ICPC 2011
 Query Context Analysis
 Howard et al, MSR 2013
 Yang and Tan, MSR 2012
 Kevic and Fritz, MSR 2014
4
Our work:
QUICKAR

SEMANTIC SIMILARITY: USING ADJACENCY
TERM LIST
ID Stack Overflow Question Title
6470651 Creating a memory leak with Java
4948521 Easiest way to cause memory leak in Java?
1071631 Tracking down a memory leak/ garbage-collection issue in Java
5
Term Adjacency List
Create (T1) {Memory, leak, Java}
Cause (T2) {Easiest, way, memory, leak, Java}
Track (T3) {Down, memory, leak, garbage, collection, issues, Java}
T1 ∩ T2 ≠ , T2 ∩ T3 ≠ , T1 ∩ T3 ≠
Table: Duplicate questions from Stack Overflow
T1 ≡ T2 ≡ T3

QUICKAR: PROPOSED TECHNIQUE FOR
QUERY REFORMULATION
6
QUICKAR
Construction of
adjacency list database
Reformulation of
initial search query

CONSTRUCTION OF
ADJACENCY LIST DATABASE
7
Stack Overflow
questions [input]
Question title Natural language
preprocessing
Natural language
tokens
Term adjacency
analysis
Adjacency list
Database [output]
Easiest way to cause memory leak in JavaEasiest way cause memory leak Java
Easiest way cause memory leak Java
Term Adjacent list
T1 T2 T3 T6
T2 T1 T4
T3 T1 T5 T6

QUICKAR: REFORMULATION OF INITIAL
QUERY
8
Initial query
[input]
Search keywords
Adjacency list
database
Project source
code
Preprocessing
& reduction
Semantic
SimilarityCo-occurrence
analysis
Reformulation
candidates
Reformulation
candidates

WORKING EXAMPLE OF QUICKAR
9
Raw : correct / remove warnings on ECF projects
Reduced : warnings ECF projects
Baseline : correct remove warnings ECF projects
Reduced version +
source compiler errors
web workspace
Reduced version +
Implementation core
Util project size
Reformulated query: warnings ECF projects
source compiler errors web workspace

EXPERIMENTAL DESIGN
10
510 baseline
queries
2 Subject
systems
Baseline
Technique
(Rocchio’s Method,
ICSE 2013)
QUICKAR
Code search
engine

EXPERIMENTAL RESULTS
11
Technique Improved Worsened Preserved
Baseline
(preprocessed)
17.84% 9.90% 72.27%
QUICKARP 49.15% 48.41% 2.44%
QUICKARSO 47.83% 49.91% 2.27%
QUICKARred 55.55% 24.46% 19.99%
QUICKARALL 66.54% 23.65% 9.81%
 Both QUICKARP and QUICKARSO perform almost equally.
 QUICKARred found quite dominating over the others.
 Results from QUICKARP, QUICKARSO and QUICKARred overlap, but
they succeed for 21%--42% unique queries in isolation.
 Combination of all 3 maximizes the results.

QUERY IMPROVEMENT SPECTRUM
12
143
39
39 37
27 19
62
QUICKARP QUICKARSO
QUICKARred

COMPARISON WITH BASELINE TECHNIQUE
13
Technique System Improved Worsened Preserved MWU
Rocchio’s
Method, Haiduc
et al.
ECF 39.64% 59.46% <1.00% ***
PDE 40.63% 59.38% 0.00% ***
QUICKARP
ECF 53.15% 43.69% 3.15% **
PDE 45.14% 53.13% 1.74%
QUICKARALL
ECF 71.62% 18.47% 9.90% ---
PDE 61.46% 28.82% 9.72% ---
*** highly significant, ** significant difference with
QUICKARALL

TAKE-HOME MESSAGE
 Only 12.2% of search queries from the developers
are relevant for code change tasks, i.e., vocabulary
mismatch is a great concern.
 Automatic query reformulation is essential.
 Relevance feedback from project source might
always not be sufficient.
 Queries can be reformulated effectively using
Stack Overflow information.
 QUICKAR combines vocabulary from project
source and Stack Overflow questions.
 Empirical evaluation and validation demonstrate
potential for our technique.
14

THANK YOU! QUESTIONS?
15
Masud Rahman (masud.rahman@usask.ca)
QUICKAR: http://www.usask.ca/~masud.rahman/quickar

QUICKAR-ASE2016-Singapore

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à QUICKAR-ASE2016-Singapore

Similaire à QUICKAR-ASE2016-Singapore (20)

Plus de Masud Rahman

Plus de Masud Rahman (20)

Dernier

Dernier (20)

QUICKAR-ASE2016-Singapore

Notes de l'éditeur