ACER-ASE2017-slides

IMPROVED QUERY REFORMULATION FOR
CONCEPT LOCATION USING CODERANK AND
DOCUMENT STRUCTURES
Mohammad Masudur Rahman, Chanchal K. Roy
Department of Computer Science
University of Saskatchewan, Canada
International Conference on Automated Software Engineering
(ASE 2017), Urbana-Champaign, IL, USA

AN EXAMPLE CHANGE REQUEST
2
Field Content
Issue ID 31110
Product eclipse.jdt.debug
Title Debbugger Source Lookup does not work with variables
Description In the Debugger Source Lookup dialog I can also select
variables for source lookup. (Advanced... > Add
Variables). I selected the variable which points to the
archive containing the source file for the type, but the
debugger still claims that he cannot find the source

SEARCH KEYWORD SELECTION
3
Field Content
Issue ID 31110
Title Debbugger Source Lookup does not work with
variables
Description In the Debugger Source Lookup dialog I can also
select variables for source lookup. (Advanced... > Add
debugger still claims that he cannot find the source.

CHANGE REQUEST TO CODE MAPPING
4
Field Content
Issue ID 31110
Title Debbugger Source Lookup does not work with
variables
Description In the Debugger Source Lookup dialog I can also
select variables for source lookup. (Advanced... > Add
debugger still claims that
he cannot find the source

BASELINE SEARCH QUERIES
5
Technique Query QE
Baseline debugger source lookup 79
Baseline debugger source lookup work variables 77
Baseline
query
Baseline +
Expansion terms
Pseudo-relevance Feedback

TRADITIONAL QUERY REFORMULATIONS
6
Technique Reformulated Query QE
RSV 1990 debugger source lookup work variables +
launch configuration jdt java debug
30
Sisman &
Kak 2013
debugger source lookup work variables +
test exception suite core code
51
Refoqus
2013
launch jdt configuration classpath project
12
Technique Query QE

BIG PICTURE: TERM WEIGHTING
7


RFDd t
t
n
D
dftIDFTF log)),log(1()(
Baseline
query
Baseline +
Expansion terms

BIG PICTURE: TERM WEIGHTING
8


RFDd t
t
n
D
dftIDFTF log)),log(1()(
• Different semantics
• Different structures

OUR CONTRIBUTIONS (2)
 Novel term weighting method – CodeRank
 Novel query reformulation technique -- ACER
9

CODERANK: TERM WEIGHTING FOR SOURCE
CODE TERMS
10

CODERANK CALCULATION: STEP I
11

CODERANK CALCULATION: STEP II
12
resolveRuntimeClasspathEntry
Resolve Runtime Classpath Entry

CODERANK CALCULATION: STEP III
13


)(
)10(
|)(|
)(
)1()(
iVInj j
j
i
VOut
VS
VS 
Most important face
in this crowd
1. resolve
2. required
3. launch
4. classpath
5. runtime

ACER: QUERY REFORMULATION USING
CODERANK & MACHINE LEARNING
14

ACER: SELECTION OF THE BEST QUERY
REFORMULATION
16
Ref. candidate
(method sig.)
Ref. candidate
(field sig.)
Ref. candidate
(method + field sigs)
Data re-samplingMachine learning
(Ensemble learning)
Select of the best
reformulation
Reformulated
query

ACER: QUERY REFORMULATIONS
17
Technique Query QE
Refoqus
2013
launch jdt configuration classpath project
12
CodeRank
(method)
launch debug resolve required classpath
02
CodeRank
(field)
label classpath system resolution launch
06
CodeRank
(both)
java type launch classpath label
16
ACER debugger source lookup work variables +
launch debug resolve required classpath
02

EXPERIMENTAL DATASET
18
8 Projects (Apache + Eclipse)
GitHub commits &
Change set
BugZilla + JIRA issues
1,675 change
requests

EXPERIMENTAL SETUP
19
Change
request
Baseline
query
Reformulated
query
Code search
Our ranks
Baseline
ranks
Compare
Query Effectiveness (QE)
Mean Reciprocal Rank (MRR)
Top-K Accuracy

RESEARCH QUESTIONS (5)
 RQ1: Does ACER improve baseline queries
significantly?
 RQ2: Does CodeRank perform better than the
traditional term weights (e.g., TF-IDF)?
 RQ3: Does document structure make a
difference in query reformulation?
 RQ4: How stemming, query length and relevance
feedback size affect our performance?
 RQ5: Does ACER outperform the state-of-the-art
in query reformulation for concept location?
20

ANSWERING RQ1: QUERY EFFECTIVENESS OVER
BASELINE
21
Query Pairs Improved (MRD Worsened
(MRD)
P-value Preserved
CodeRankmethod vs.
Baseline
58.93% (-61) 37.99% (+131) 0.007* 3.08%
CodeRankfield vs.
Baseline
52.51% (-51) 44.57% (+151) 0.063 2.91%
CodeRankboth vs.
Baseline
58.62% (-51) 38.19% (+136) *0.018* 3.20%
ACER vs. Baseline 71.05% (-81) 2.51% (+104) <0.001* 26.44%
*= Significant difference between improvements and worsening, MRD = Mean Rank
Difference

ANSWERING RQ2: CODERANK VS. TRADITIONAL
TERM WEIGHTS
22

ANSWERING RQ3: DO SOURCE DOCUMENT
STRUCTURES MATTER?
23

ANSWERING RQ3: DO SOURCE DOCUMENT
STRUCTURES MATTER?
24

ANSWERING RQ4: IMPACT OF
REFORMULATION LENGTH
25

RQ5: COMPARISON WITH EXISTING METHODS
26*Our performance is significantly higher for each metric
than the state-of-the-art
1. CodeRank
2. Document contexts
3. Data re-sampling

TAKE-HOME MESSAGES
 Reformulation of a search query is highly challenging
for the developers, costs lots of efforts.
 Traditional term weights are not sufficient enough.
 We provide CodeRank that exploits source term
semantics and source document contexts.
 We provide ACER that provides the best from a set of
reformulation candidates prepared by CodeRank.
 Experiments with 1,675 change requests from 8 OSS
systems of Apache & Eclipse.
 71% of queries improved, only 3% worsened by ACER.
 Comparison with five methods including the state-of-the-
art validates our approach. 27

THANK YOU !!! QUESTIONS?
28
More details on CodeRank & ACER:
http://www.usask.ca/~masud.rahman/acer/
Contact: masud.rahman@usask.ca
Masud Rahman

RQ5: COMPARISON WITH EXISTING METHODS
29Our Top-K accuracy is clearly higher for various K-values
than the state-of-the-art

ACER-ASE2017-slides

Recommandé

Recommandé

Contenu connexe

Similaire à ACER-ASE2017-slides

Similaire à ACER-ASE2017-slides (20)

Plus de Masud Rahman

Plus de Masud Rahman (20)

Dernier

Dernier (20)

ACER-ASE2017-slides

Notes de l'éditeur