The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
ACER-ASE2017-slides
1. IMPROVED QUERY REFORMULATION FOR
CONCEPT LOCATION USING CODERANK AND
DOCUMENT STRUCTURES
Mohammad Masudur Rahman, Chanchal K. Roy
Department of Computer Science
University of Saskatchewan, Canada
International Conference on Automated Software Engineering
(ASE 2017), Urbana-Champaign, IL, USA
2. AN EXAMPLE CHANGE REQUEST
2
Field Content
Issue ID 31110
Product eclipse.jdt.debug
Title Debbugger Source Lookup does not work with variables
Description In the Debugger Source Lookup dialog I can also select
variables for source lookup. (Advanced... > Add
Variables). I selected the variable which points to the
archive containing the source file for the type, but the
debugger still claims that he cannot find the source
3. SEARCH KEYWORD SELECTION
3
Field Content
Issue ID 31110
Product eclipse.jdt.debug
Title Debbugger Source Lookup does not work with
variables
Description In the Debugger Source Lookup dialog I can also
select variables for source lookup. (Advanced... > Add
Variables). I selected the variable which points to the
archive containing the source file for the type, but the
debugger still claims that he cannot find the source.
4. CHANGE REQUEST TO CODE MAPPING
4
Field Content
Issue ID 31110
Product eclipse.jdt.debug
Title Debbugger Source Lookup does not work with
variables
Description In the Debugger Source Lookup dialog I can also
select variables for source lookup. (Advanced... > Add
Variables). I selected the variable which points to the
archive containing the source file for the type, but the
debugger still claims that
he cannot find the source
13. CODERANK CALCULATION: STEP III
13
)(
)10(
|)(|
)(
)1()(
iVInj j
j
i
VOut
VS
VS
Most important face
in this crowd
1. resolve
2. required
3. launch
4. classpath
5. runtime
16. ACER: SELECTION OF THE BEST QUERY
REFORMULATION
16
Ref. candidate
(method sig.)
Ref. candidate
(field sig.)
Ref. candidate
(method + field sigs)
Data re-samplingMachine learning
(Ensemble learning)
Select of the best
reformulation
Reformulated
query
17. ACER: QUERY REFORMULATIONS
17
Technique Query QE
Baseline debugger source lookup 79
Baseline debugger source lookup work variables 77
Refoqus
2013
debugger source lookup work variables +
launch jdt configuration classpath project
12
CodeRank
(method)
debugger source lookup work variables +
launch debug resolve required classpath
02
CodeRank
(field)
debugger source lookup work variables +
label classpath system resolution launch
06
CodeRank
(both)
debugger source lookup work variables +
java type launch classpath label
16
ACER debugger source lookup work variables +
launch debug resolve required classpath
02
20. RESEARCH QUESTIONS (5)
RQ1: Does ACER improve baseline queries
significantly?
RQ2: Does CodeRank perform better than the
traditional term weights (e.g., TF-IDF)?
RQ3: Does document structure make a
difference in query reformulation?
RQ4: How stemming, query length and relevance
feedback size affect our performance?
RQ5: Does ACER outperform the state-of-the-art
in query reformulation for concept location?
20
21. ANSWERING RQ1: QUERY EFFECTIVENESS OVER
BASELINE
21
Query Pairs Improved (MRD Worsened
(MRD)
P-value Preserved
CodeRankmethod vs.
Baseline
58.93% (-61) 37.99% (+131) 0.007* 3.08%
CodeRankfield vs.
Baseline
52.51% (-51) 44.57% (+151) 0.063 2.91%
CodeRankboth vs.
Baseline
58.62% (-51) 38.19% (+136) *0.018* 3.20%
ACER vs. Baseline 71.05% (-81) 2.51% (+104) <0.001* 26.44%
*= Significant difference between improvements and worsening, MRD = Mean Rank
Difference
26. RQ5: COMPARISON WITH EXISTING METHODS
26*Our performance is significantly higher for each metric
than the state-of-the-art
1. CodeRank
2. Document contexts
3. Data re-sampling
27. TAKE-HOME MESSAGES
Reformulation of a search query is highly challenging
for the developers, costs lots of efforts.
Traditional term weights are not sufficient enough.
We provide CodeRank that exploits source term
semantics and source document contexts.
We provide ACER that provides the best from a set of
reformulation candidates prepared by CodeRank.
Experiments with 1,675 change requests from 8 OSS
systems of Apache & Eclipse.
71% of queries improved, only 3% worsened by ACER.
Comparison with five methods including the state-of-the-
art validates our approach. 27
28. THANK YOU !!! QUESTIONS?
28
More details on CodeRank & ACER:
http://www.usask.ca/~masud.rahman/acer/
Contact: masud.rahman@usask.ca
Masud Rahman
29. RQ5: COMPARISON WITH EXISTING METHODS
29Our Top-K accuracy is clearly higher for various K-values
than the state-of-the-art
Notes de l'éditeur
Good morning, everyone.
Introduce yourself.
Today, I am going to talk about a query reformulation technique for concept location where
we used an advanced term weighting method and performed machine learning.
Now, this is a real software change request.
Here these two sections are important, and they contain information about the requested change.
Now when a request like this is submitted, a developer tries to find out important terms.
Then they use those terms for finding the source code to change probably using a search engine like Lucene
That is, they try to map the concepts discussed in the change request to appropriate source code sections like this.
This is how, the term comes– “concept location” if you want me to define it.
But this concept location is NOT an easy task.
For example, these two very reasonable queries from the change request do not perform well.
This second one returns correct results at 77th positions, which is not acceptable of course.
So, what is needed here is– the reformulation of the query for better.
Now, there are traditional tool supports for doing that.
What most of them do is, they throw in the initial query to the search engine, collect the results, and then collect most important terms from those results for
The reformulation of the initial poor query.
Now, these are the reformulated queries from three existing such methods.
Now, they did some improvements in the ranking, and return results a bit closer to the top.
But as you can see, they are not clearly enough.
Developers want the results at the top positions, so they are still costly for practical use.
Now, we investigate this part of the reformulation process, and found that
Most of the existing techniques are using this equation for determining importance of a term.
That is, they are selecting TF-IDF to find the words for query reformulation.
In other words, they are relying on the frequency of a term as a proxy to its importance.
Now, this is a metric which has been on the play from last the century. It was proposed in the 70s.
It is a good metric, but it was actually proposed for regular texts such as news articles.
On the other hand, we are dealing with source code here.
Now, regular texts and source code have different semantics and different structures.
They are not the same
So, metrics for regular texts are not appropriate for the source code– this is our hypothesis.
So, we made two contributions here.
We propose CodeRank– a novel and appropriate term weighting method for source code.
We propose ACER -- a novel query reformulation technique that uses this term weight.
First comes CodeRank.
Now, what we did?
We extract important artifacts from source code such as method signature, formal parameters and field signatures from the code.
We mostly used AST parsing and regular expressions for this.
The idea is – signatures capture more rich intent than other texts.
For example, method signatures provide the intent whereas the method body implements the intent with lots of noise.
Now once such items are extracted, we split them.
Now as we see, these single terms share some kind of semantics to convey a broader semantic.
That is, they complement each other in this context.
Now, we capture such semantic dependencies in the source code, and develop a term graph like this.
Now, once the graph is developed, we use a popular graph-based algorithm called PageRank for determining the node importance.
OK Lets go visual.
In a crowd, the most important person is the one whom everybody is looking at.
It can be also seen as votes. The person who is voted the most is the leader.
We also follow that concept in the context of our term graph. That is, the term which is connected the most with other terms is an important term.
Now, this scoring is a recursive process, we finally get a ranked list of important terms which can used as reformulation terms.
Now comes the ACER, the second contribution.
This is the schematic diagram of our approach.
So far we talked about these parts of our approach.
Now we will zoom in this part.
Once the CodeRank is calculated, we collect multiple reformulation candidates for a given initial query.
As we discussed, a source document has various contexts– method signature, field signature and so on.
We make use of such contexts, and develop multiple reformulation candidates.
Now, since we have multiple options, we have to choose the best reformulation.
In order to do that, we apply machine learning. In particular, we determine the quality of each candidate using 20 quality metrics that mostly came from IR domain.
Then we use a regression-tree based classifier and suggest the best reformulated query.
Now lets see what is the outcome.
Here, we have created three reformulation candidates using CodeRank and source document contexts.
Then our ML classifier returns the best option, and it returns the result at the 2nd position.
Now, if we look closely, our technique identifies two unique terms which made the real difference in performance.
For experiments, we select 8 subject systems from Apache and Eclipse.
We collect 1.5 thousand change requests/bug reports from BugZilla and JIRA,
We use the report title as our query and prepare the gold set by consulting the commit history of those projects from GitHub.
These are the widely accepted approach to do experiments in this area.
For experiments,
We collect our queries and the baseline queries , and feed them to a code search engine.
Then we collect their results/ranks and compare.
For evaluation/validation, we used these four performance metrics.
Now, in our experiment, we answer these five research questions.
In the first research question, we compare our queries with the baseline queries.
As we see, method signature based reformulation performs the best than the other two options.
However, the Machine Learning selects the best among the three, and provides the best performance.
For example, our reformulation improves 71% of the queries, preserves 26% and degrades only 03% of the queries.
So, obviously, we are improving more queries than degrading.
In the second research question, we compare CodeRank with the traditional term weights – Term Frequency and TF-IDF.
We see that TF performs better than TF-IDF, which is interesting.
Anyway, when compared with our CodeRank, we see that TF performs better initially but then CodeRank outperforms it later.,
especially for 10-15 reformulation terms.
That is, few highly frequent terms are really important, but yes, CodeRank is more reliable than Term Frequency for term importance.
In the third research question, we show how document structures/contexts make a difference.
These are the number of improved queries by various reformulation candidates.
Now we see 19% of the total improvement are unique to each single contexts.
That is if, we consider only method signatures for query reformulation, we miss the improvements made by field signature based reformulations.
Again, if we consider the whole texts rather than signatures, we also miss some query improvements.
This is not only for CodeRank, this is also true if we employ term frequency in those contexts.
Thus, document contexts matter for query reformulation.
Now, when we consider query improvements by ACER and Term frequency in terms of Vern diagram,
We also found that 66% overlaps, but ACER provides a unique set of improvements which is three times that of TF. Now ACER does document structures and TF does not.
And we see the difference here.
In the fourth research question, we do the calibration for reformulation length.
We found the best performance is achieved when the reformulation length is between 10 to 15.
This where CodeRank saturates.
In the fifth research question, we compare our query improvement and worsening ratios with the existing methods.
We see our median improvement is much higher than others.
More importantly, we degrade a very low amount of queries compared to the others.
Obviously, these measures are significantly higher.
Thus, according to our investigation, ACER is the winner.
But we must also admit, the ML-based approach is less scalable, and now we are working on the tool.
Thus, these are take-home messages.
Query reformulation is a challenging task for the developers. Google does not work local source code repository.
Traditional term weights are not clearly sufficient or appropriate for source code.
We provide CodeRank, a novel term weight for source code.
We provide ACER, an improved reformulation technique.
Our technique improves about 71% of the queries and degrades only a handful queries.
Comparison with the state-of-the-art shows the promising aspect of our method.
Thanks for your time and attention.
I am ready to have a few questions.
When we consider various Top-K accuracy, we got similar findings.
Our method located concepts correctly for 80% of the change requests whereas they did for 60% of them at best.
This shows the potential of our technique.