SlideShare une entreprise Scribd logo
1  sur  15
QUICKAR: AUTOMATIC QUERY
REFORMULATION FOR CONCEPT
LOCATION USING CROWDSOURCED
KNOWLEDGE
Mohammad Masudur Rahman, Chanchal K. Roy
Department of Computer Science
University of Saskatchewan, Canada
31st IEEE/ACM International Conference on
Automated Software Engineering (ASE 2016), Singapore
CONCEPT LOCATION: MAPPING CONCEPTS TO
SOURCE CODE
2
Software
change request
Software
source code
CONCEPT LOCATION: REFORMULATION OF
CODE SEARCH QUERIES
3
Software user
Change request
Issue database
Software
developer
Initial query
Code search
engine
Code repositorySearch results
Stop
Yes
No
Query reformulation
12.2% of the queries by
developers were useful, Kevic
and Fritz, ICSE 2014
10%-15% of
times vocabulary
matched, Furnas et
al, Commun. ACM,
1987
Our research problem
QUERY REFORMULATION LITERATURE
 Relevance Feedback
 Gay et al, ICSM 2009
 Haiduc et al, ICSE 2013
 Query Quality Analysis
 Haiduc et al, ASE 2012
 Haiduc et al, ICSE 2012
 Haiduc and Marcus, ICPC 2011
 Query Context Analysis
 Howard et al, MSR 2013
 Yang and Tan, MSR 2012
 Kevic and Fritz, MSR 2014
4
Our work:
QUICKAR
SEMANTIC SIMILARITY: USING ADJACENCY
TERM LIST
ID Stack Overflow Question Title
6470651 Creating a memory leak with Java
4948521 Easiest way to cause memory leak in Java?
1071631 Tracking down a memory leak/ garbage-collection issue in Java
5
Term Adjacency List
Create (T1) {Memory, leak, Java}
Cause (T2) {Easiest, way, memory, leak, Java}
Track (T3) {Down, memory, leak, garbage, collection, issues, Java}
T1 ∩ T2 ≠ , T2 ∩ T3 ≠ , T1 ∩ T3 ≠
Table: Duplicate questions from Stack Overflow
T1 ≡ T2 ≡ T3
QUICKAR: PROPOSED TECHNIQUE FOR
QUERY REFORMULATION
6
QUICKAR
Construction of
adjacency list database
Reformulation of
initial search query
CONSTRUCTION OF
ADJACENCY LIST DATABASE
7
Stack Overflow
questions [input]
Question title Natural language
preprocessing
Natural language
tokens
Term adjacency
analysis
Adjacency list
Database [output]
Easiest way to cause memory leak in JavaEasiest way cause memory leak Java
Easiest way cause memory leak Java
Term Adjacent list
T1 T2 T3 T6
T2 T1 T4
T3 T1 T5 T6
QUICKAR: REFORMULATION OF INITIAL
QUERY
8
Initial query
[input]
Search keywords
Adjacency list
database
Project source
code
Preprocessing
& reduction
Semantic
SimilarityCo-occurrence
analysis
Reformulation
candidates
Reformulation
candidates
WORKING EXAMPLE OF QUICKAR
9
Raw : correct / remove warnings on ECF projects
Reduced : warnings ECF projects
Baseline : correct remove warnings ECF projects
Reduced version +
source compiler errors
web workspace
Reduced version +
Implementation core
Util project size
Reformulated query: warnings ECF projects
source compiler errors web workspace
EXPERIMENTAL DESIGN
10
510 baseline
queries
2 Subject
systems
Baseline
Technique
(Rocchio’s Method,
ICSE 2013)
QUICKAR
Code search
engine
EXPERIMENTAL RESULTS
11
Technique Improved Worsened Preserved
Baseline
(preprocessed)
17.84% 9.90% 72.27%
QUICKARP 49.15% 48.41% 2.44%
QUICKARSO 47.83% 49.91% 2.27%
QUICKARred 55.55% 24.46% 19.99%
QUICKARALL 66.54% 23.65% 9.81%
 Both QUICKARP and QUICKARSO perform almost equally.
 QUICKARred found quite dominating over the others.
 Results from QUICKARP, QUICKARSO and QUICKARred overlap, but
they succeed for 21%--42% unique queries in isolation.
 Combination of all 3 maximizes the results.
QUERY IMPROVEMENT SPECTRUM
12
143
39
39 37
27 19
62
QUICKARP QUICKARSO
QUICKARred
COMPARISON WITH BASELINE TECHNIQUE
13
Technique System Improved Worsened Preserved MWU
Rocchio’s
Method, Haiduc
et al.
ECF 39.64% 59.46% <1.00% ***
PDE 40.63% 59.38% 0.00% ***
QUICKARP
ECF 53.15% 43.69% 3.15% **
PDE 45.14% 53.13% 1.74%
QUICKARALL
ECF 71.62% 18.47% 9.90% ---
PDE 61.46% 28.82% 9.72% ---
*** highly significant, ** significant difference with
QUICKARALL
TAKE-HOME MESSAGE
 Only 12.2% of search queries from the developers
are relevant for code change tasks, i.e., vocabulary
mismatch is a great concern.
 Automatic query reformulation is essential.
 Relevance feedback from project source might
always not be sufficient.
 Queries can be reformulated effectively using
Stack Overflow information.
 QUICKAR combines vocabulary from project
source and Stack Overflow questions.
 Empirical evaluation and validation demonstrate
potential for our technique.
14
THANK YOU! QUESTIONS?
15
Masud Rahman (masud.rahman@usask.ca)
QUICKAR: http://www.usask.ca/~masud.rahman/quickar

Contenu connexe

Tendances

Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
Requirements driven Model-based Testing
Requirements driven Model-based TestingRequirements driven Model-based Testing
Requirements driven Model-based TestingDharmalingam Ganesan
 
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewAli Ouni
 
Automated Traceability for Software Engineering Tasks
Automated Traceability for Software Engineering TasksAutomated Traceability for Software Engineering Tasks
Automated Traceability for Software Engineering TasksDharmalingam Ganesan
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect predictionAmmAr mobark
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...Yuki Ueda
 
Automatically Customizing Static Analysis Tools to Coding Rules Really Follow...
Automatically Customizing Static Analysis Tools to Coding Rules Really Follow...Automatically Customizing Static Analysis Tools to Coding Rules Really Follow...
Automatically Customizing Static Analysis Tools to Coding Rules Really Follow...Yuki Ueda
 
ICSE2014
ICSE2014ICSE2014
ICSE2014swy351
 
Revisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub ProjectsRevisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub ProjectsPavneet Singh Kochhar
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsThe University of Adelaide
 
2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Repent analyzing the nature of id...
2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Repent analyzing the nature of id...2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Repent analyzing the nature of id...
2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Repent analyzing the nature of id...IEEEBEBTECHSTUDENTSPROJECTS
 
Systematic Architecture Level Fault Diagnosis Using Statistical Techniques
Systematic Architecture Level Fault Diagnosis Using Statistical TechniquesSystematic Architecture Level Fault Diagnosis Using Statistical Techniques
Systematic Architecture Level Fault Diagnosis Using Statistical TechniquesFabian Keller
 
New Testing Standards Are on the Horizon: What Will Be Their Impact?
New Testing Standards Are on the Horizon: What Will Be Their Impact?New Testing Standards Are on the Horizon: What Will Be Their Impact?
New Testing Standards Are on the Horizon: What Will Be Their Impact?TechWell
 
On the Use of Static Analysis to Safeguard Recursive Dependency Resolution
On the Use of Static Analysis to Safeguard Recursive Dependency ResolutionOn the Use of Static Analysis to Safeguard Recursive Dependency Resolution
On the Use of Static Analysis to Safeguard Recursive Dependency ResolutionKamil Jezek
 
An Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsAn Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsPavneet Singh Kochhar
 

Tendances (20)

STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Requirements driven Model-based Testing
Requirements driven Model-based TestingRequirements driven Model-based Testing
Requirements driven Model-based Testing
 
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
 
Automated Traceability for Software Engineering Tasks
Automated Traceability for Software Engineering TasksAutomated Traceability for Software Engineering Tasks
Automated Traceability for Software Engineering Tasks
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect prediction
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...
Impact of Coding Style Checker on Code Review -A case study on the OpenStack ...
 
Automatically Customizing Static Analysis Tools to Coding Rules Really Follow...
Automatically Customizing Static Analysis Tools to Coding Rules Really Follow...Automatically Customizing Static Analysis Tools to Coding Rules Really Follow...
Automatically Customizing Static Analysis Tools to Coding Rules Really Follow...
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
 
Revisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub ProjectsRevisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub Projects
 
Who Should Review My Code?
Who Should Review My Code?  Who Should Review My Code?
Who Should Review My Code?
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review Analytics
 
2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Repent analyzing the nature of id...
2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Repent analyzing the nature of id...2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Repent analyzing the nature of id...
2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Repent analyzing the nature of id...
 
Systematic Architecture Level Fault Diagnosis Using Statistical Techniques
Systematic Architecture Level Fault Diagnosis Using Statistical TechniquesSystematic Architecture Level Fault Diagnosis Using Statistical Techniques
Systematic Architecture Level Fault Diagnosis Using Statistical Techniques
 
New Testing Standards Are on the Horizon: What Will Be Their Impact?
New Testing Standards Are on the Horizon: What Will Be Their Impact?New Testing Standards Are on the Horizon: What Will Be Their Impact?
New Testing Standards Are on the Horizon: What Will Be Their Impact?
 
On the Use of Static Analysis to Safeguard Recursive Dependency Resolution
On the Use of Static Analysis to Safeguard Recursive Dependency ResolutionOn the Use of Static Analysis to Safeguard Recursive Dependency Resolution
On the Use of Static Analysis to Safeguard Recursive Dependency Resolution
 
An Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsAn Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source Projects
 

Similaire à QUICKAR-ASE2016-Singapore

CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016Masud Rahman
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017Masud Rahman
 
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationAssisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationBunyamin Sisman
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-MeetingMasud Rahman
 
SurfClipse-- An IDE based context-aware Meta Search Engine
SurfClipse-- An IDE based context-aware Meta Search EngineSurfClipse-- An IDE based context-aware Meta Search Engine
SurfClipse-- An IDE based context-aware Meta Search EngineMasud Rahman
 
Iwsm2014 application of function points to software based on open source - ...
Iwsm2014   application of function points to software based on open source - ...Iwsm2014   application of function points to software based on open source - ...
Iwsm2014 application of function points to software based on open source - ...Nesma
 
SurfClipse-- An IDE based context-aware Meta Search Engine (Tool Demo)
SurfClipse-- An IDE based context-aware Meta Search Engine (Tool Demo)SurfClipse-- An IDE based context-aware Meta Search Engine (Tool Demo)
SurfClipse-- An IDE based context-aware Meta Search Engine (Tool Demo)Masud Rahman
 
Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study
Historical and Impact Analysis of API Breaking Changes: A Large-Scale StudyHistorical and Impact Analysis of API Breaking Changes: A Large-Scale Study
Historical and Impact Analysis of API Breaking Changes: A Large-Scale StudyLaerte Xavier
 
Improving the Performance of Database-Centric Applications Through Program An...
Improving the Performance of Database-Centric Applications Through Program An...Improving the Performance of Database-Centric Applications Through Program An...
Improving the Performance of Database-Centric Applications Through Program An...Concordia University
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolKellyton Brito
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicDavid Solivan
 
An IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineAn IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineMasud Rahman
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smelleSAT Publishing House
 

Similaire à QUICKAR-ASE2016-Singapore (20)

STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
CORRECT-ICSE2016
CORRECT-ICSE2016CORRECT-ICSE2016
CORRECT-ICSE2016
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
 
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationAssisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug Localization
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-Meeting
 
SurfClipse-- An IDE based context-aware Meta Search Engine
SurfClipse-- An IDE based context-aware Meta Search EngineSurfClipse-- An IDE based context-aware Meta Search Engine
SurfClipse-- An IDE based context-aware Meta Search Engine
 
Ch07
Ch07Ch07
Ch07
 
Iwsm2014 application of function points to software based on open source - ...
Iwsm2014   application of function points to software based on open source - ...Iwsm2014   application of function points to software based on open source - ...
Iwsm2014 application of function points to software based on open source - ...
 
Paper summary
Paper summaryPaper summary
Paper summary
 
SurfClipse-- An IDE based context-aware Meta Search Engine (Tool Demo)
SurfClipse-- An IDE based context-aware Meta Search Engine (Tool Demo)SurfClipse-- An IDE based context-aware Meta Search Engine (Tool Demo)
SurfClipse-- An IDE based context-aware Meta Search Engine (Tool Demo)
 
Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study
Historical and Impact Analysis of API Breaking Changes: A Large-Scale StudyHistorical and Impact Analysis of API Breaking Changes: A Large-Scale Study
Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study
 
Improving the Performance of Database-Centric Applications Through Program An...
Improving the Performance of Database-Centric Applications Through Program An...Improving the Performance of Database-Centric Applications Through Program An...
Improving the Performance of Database-Centric Applications Through Program An...
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs Public
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
An IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineAn IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search Engine
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smell
 

Plus de Masud Rahman

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityMasud Rahman
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanMasud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud RahmanMasud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanMasud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanMasud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationMasud Rahman
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015Masud Rahman
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slidesMasud Rahman
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018Masud Rahman
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Masud Rahman
 
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationImproving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationMasud Rahman
 
Exploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsExploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsMasud Rahman
 
SOAP--Simple Object Access Protocol
SOAP--Simple Object Access ProtocolSOAP--Simple Object Access Protocol
SOAP--Simple Object Access ProtocolMasud Rahman
 

Plus de Masud Rahman (20)

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie University
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
MSR2017-Challenge
MSR2017-ChallengeMSR2017-Challenge
MSR2017-Challenge
 
MSR2017-RevHelper
MSR2017-RevHelperMSR2017-RevHelper
MSR2017-RevHelper
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
 
MSR2014-Challenge
MSR2014-ChallengeMSR2014-Challenge
MSR2014-Challenge
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015
 
CMPT-842-BRACK
CMPT-842-BRACKCMPT-842-BRACK
CMPT-842-BRACK
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slides
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
 
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationImproving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
 
Exploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsExploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and Exceptions
 
SOAP--Simple Object Access Protocol
SOAP--Simple Object Access ProtocolSOAP--Simple Object Access Protocol
SOAP--Simple Object Access Protocol
 

Dernier

Spain Vs Albania- Spain at risk of being thrown out of Euro 2024 with Tournam...
Spain Vs Albania- Spain at risk of being thrown out of Euro 2024 with Tournam...Spain Vs Albania- Spain at risk of being thrown out of Euro 2024 with Tournam...
Spain Vs Albania- Spain at risk of being thrown out of Euro 2024 with Tournam...World Wide Tickets And Hospitality
 
TAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdf
TAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdfTAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdf
TAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdfSocial Samosa
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
( Sports training) All topic (MCQs).pptx
( Sports training) All topic (MCQs).pptx( Sports training) All topic (MCQs).pptx
( Sports training) All topic (MCQs).pptxParshotamGupta1
 
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfJORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfArturo Pacheco Alvarez
 
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...Neil Horowitz
 
08448380779 Call Girls In International Airport Women Seeking Men
08448380779 Call Girls In International Airport Women Seeking Men08448380779 Call Girls In International Airport Women Seeking Men
08448380779 Call Girls In International Airport Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In IIT Women Seeking Men
08448380779 Call Girls In IIT Women Seeking Men08448380779 Call Girls In IIT Women Seeking Men
08448380779 Call Girls In IIT Women Seeking MenDelhi Call girls
 
9990611130 Find & Book Russian Call Girls In Ghazipur
9990611130 Find & Book Russian Call Girls In Ghazipur9990611130 Find & Book Russian Call Girls In Ghazipur
9990611130 Find & Book Russian Call Girls In GhazipurGenuineGirls
 
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxSlovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxWorld Wide Tickets And Hospitality
 
08448380779 Call Girls In Karol Bagh Women Seeking Men
08448380779 Call Girls In Karol Bagh Women Seeking Men08448380779 Call Girls In Karol Bagh Women Seeking Men
08448380779 Call Girls In Karol Bagh Women Seeking MenDelhi Call girls
 
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docxUEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docxEuro Cup 2024 Tickets
 
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service 🧣
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service  🧣CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service  🧣
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service 🧣anilsa9823
 
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls Agency
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls AgencyHire 💕 8617697112 Kasauli Call Girls Service Call Girls Agency
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls AgencyNitya salvi
 
Albania Vs Spain Albania is Loaded with Defensive Talent on their Roster.docx
Albania Vs Spain Albania is Loaded with Defensive Talent on their Roster.docxAlbania Vs Spain Albania is Loaded with Defensive Talent on their Roster.docx
Albania Vs Spain Albania is Loaded with Defensive Talent on their Roster.docxWorld Wide Tickets And Hospitality
 
08448380779 Call Girls In Lajpat Nagar Women Seeking Men
08448380779 Call Girls In Lajpat Nagar Women Seeking Men08448380779 Call Girls In Lajpat Nagar Women Seeking Men
08448380779 Call Girls In Lajpat Nagar Women Seeking MenDelhi Call girls
 
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...World Wide Tickets And Hospitality
 
🔝|97111༒99012🔝 Call Girls In {Delhi} Cr Park ₹5.5k Cash Payment With Room De...
🔝|97111༒99012🔝 Call Girls In  {Delhi} Cr Park ₹5.5k Cash Payment With Room De...🔝|97111༒99012🔝 Call Girls In  {Delhi} Cr Park ₹5.5k Cash Payment With Room De...
🔝|97111༒99012🔝 Call Girls In {Delhi} Cr Park ₹5.5k Cash Payment With Room De...Diya Sharma
 
Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.Marina Costa
 

Dernier (20)

Spain Vs Albania- Spain at risk of being thrown out of Euro 2024 with Tournam...
Spain Vs Albania- Spain at risk of being thrown out of Euro 2024 with Tournam...Spain Vs Albania- Spain at risk of being thrown out of Euro 2024 with Tournam...
Spain Vs Albania- Spain at risk of being thrown out of Euro 2024 with Tournam...
 
TAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdf
TAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdfTAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdf
TAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
( Sports training) All topic (MCQs).pptx
( Sports training) All topic (MCQs).pptx( Sports training) All topic (MCQs).pptx
( Sports training) All topic (MCQs).pptx
 
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfJORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
 
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
 
08448380779 Call Girls In International Airport Women Seeking Men
08448380779 Call Girls In International Airport Women Seeking Men08448380779 Call Girls In International Airport Women Seeking Men
08448380779 Call Girls In International Airport Women Seeking Men
 
Call Girls Service Noida Extension @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Noida Extension @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Noida Extension @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Noida Extension @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
08448380779 Call Girls In IIT Women Seeking Men
08448380779 Call Girls In IIT Women Seeking Men08448380779 Call Girls In IIT Women Seeking Men
08448380779 Call Girls In IIT Women Seeking Men
 
9990611130 Find & Book Russian Call Girls In Ghazipur
9990611130 Find & Book Russian Call Girls In Ghazipur9990611130 Find & Book Russian Call Girls In Ghazipur
9990611130 Find & Book Russian Call Girls In Ghazipur
 
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxSlovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
 
08448380779 Call Girls In Karol Bagh Women Seeking Men
08448380779 Call Girls In Karol Bagh Women Seeking Men08448380779 Call Girls In Karol Bagh Women Seeking Men
08448380779 Call Girls In Karol Bagh Women Seeking Men
 
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docxUEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
 
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service 🧣
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service  🧣CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service  🧣
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service 🧣
 
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls Agency
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls AgencyHire 💕 8617697112 Kasauli Call Girls Service Call Girls Agency
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls Agency
 
Albania Vs Spain Albania is Loaded with Defensive Talent on their Roster.docx
Albania Vs Spain Albania is Loaded with Defensive Talent on their Roster.docxAlbania Vs Spain Albania is Loaded with Defensive Talent on their Roster.docx
Albania Vs Spain Albania is Loaded with Defensive Talent on their Roster.docx
 
08448380779 Call Girls In Lajpat Nagar Women Seeking Men
08448380779 Call Girls In Lajpat Nagar Women Seeking Men08448380779 Call Girls In Lajpat Nagar Women Seeking Men
08448380779 Call Girls In Lajpat Nagar Women Seeking Men
 
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
 
🔝|97111༒99012🔝 Call Girls In {Delhi} Cr Park ₹5.5k Cash Payment With Room De...
🔝|97111༒99012🔝 Call Girls In  {Delhi} Cr Park ₹5.5k Cash Payment With Room De...🔝|97111༒99012🔝 Call Girls In  {Delhi} Cr Park ₹5.5k Cash Payment With Room De...
🔝|97111༒99012🔝 Call Girls In {Delhi} Cr Park ₹5.5k Cash Payment With Room De...
 
Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.
 

QUICKAR-ASE2016-Singapore

  • 1. QUICKAR: AUTOMATIC QUERY REFORMULATION FOR CONCEPT LOCATION USING CROWDSOURCED KNOWLEDGE Mohammad Masudur Rahman, Chanchal K. Roy Department of Computer Science University of Saskatchewan, Canada 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016), Singapore
  • 2. CONCEPT LOCATION: MAPPING CONCEPTS TO SOURCE CODE 2 Software change request Software source code
  • 3. CONCEPT LOCATION: REFORMULATION OF CODE SEARCH QUERIES 3 Software user Change request Issue database Software developer Initial query Code search engine Code repositorySearch results Stop Yes No Query reformulation 12.2% of the queries by developers were useful, Kevic and Fritz, ICSE 2014 10%-15% of times vocabulary matched, Furnas et al, Commun. ACM, 1987 Our research problem
  • 4. QUERY REFORMULATION LITERATURE  Relevance Feedback  Gay et al, ICSM 2009  Haiduc et al, ICSE 2013  Query Quality Analysis  Haiduc et al, ASE 2012  Haiduc et al, ICSE 2012  Haiduc and Marcus, ICPC 2011  Query Context Analysis  Howard et al, MSR 2013  Yang and Tan, MSR 2012  Kevic and Fritz, MSR 2014 4 Our work: QUICKAR
  • 5. SEMANTIC SIMILARITY: USING ADJACENCY TERM LIST ID Stack Overflow Question Title 6470651 Creating a memory leak with Java 4948521 Easiest way to cause memory leak in Java? 1071631 Tracking down a memory leak/ garbage-collection issue in Java 5 Term Adjacency List Create (T1) {Memory, leak, Java} Cause (T2) {Easiest, way, memory, leak, Java} Track (T3) {Down, memory, leak, garbage, collection, issues, Java} T1 ∩ T2 ≠ , T2 ∩ T3 ≠ , T1 ∩ T3 ≠ Table: Duplicate questions from Stack Overflow T1 ≡ T2 ≡ T3
  • 6. QUICKAR: PROPOSED TECHNIQUE FOR QUERY REFORMULATION 6 QUICKAR Construction of adjacency list database Reformulation of initial search query
  • 7. CONSTRUCTION OF ADJACENCY LIST DATABASE 7 Stack Overflow questions [input] Question title Natural language preprocessing Natural language tokens Term adjacency analysis Adjacency list Database [output] Easiest way to cause memory leak in JavaEasiest way cause memory leak Java Easiest way cause memory leak Java Term Adjacent list T1 T2 T3 T6 T2 T1 T4 T3 T1 T5 T6
  • 8. QUICKAR: REFORMULATION OF INITIAL QUERY 8 Initial query [input] Search keywords Adjacency list database Project source code Preprocessing & reduction Semantic SimilarityCo-occurrence analysis Reformulation candidates Reformulation candidates
  • 9. WORKING EXAMPLE OF QUICKAR 9 Raw : correct / remove warnings on ECF projects Reduced : warnings ECF projects Baseline : correct remove warnings ECF projects Reduced version + source compiler errors web workspace Reduced version + Implementation core Util project size Reformulated query: warnings ECF projects source compiler errors web workspace
  • 10. EXPERIMENTAL DESIGN 10 510 baseline queries 2 Subject systems Baseline Technique (Rocchio’s Method, ICSE 2013) QUICKAR Code search engine
  • 11. EXPERIMENTAL RESULTS 11 Technique Improved Worsened Preserved Baseline (preprocessed) 17.84% 9.90% 72.27% QUICKARP 49.15% 48.41% 2.44% QUICKARSO 47.83% 49.91% 2.27% QUICKARred 55.55% 24.46% 19.99% QUICKARALL 66.54% 23.65% 9.81%  Both QUICKARP and QUICKARSO perform almost equally.  QUICKARred found quite dominating over the others.  Results from QUICKARP, QUICKARSO and QUICKARred overlap, but they succeed for 21%--42% unique queries in isolation.  Combination of all 3 maximizes the results.
  • 12. QUERY IMPROVEMENT SPECTRUM 12 143 39 39 37 27 19 62 QUICKARP QUICKARSO QUICKARred
  • 13. COMPARISON WITH BASELINE TECHNIQUE 13 Technique System Improved Worsened Preserved MWU Rocchio’s Method, Haiduc et al. ECF 39.64% 59.46% <1.00% *** PDE 40.63% 59.38% 0.00% *** QUICKARP ECF 53.15% 43.69% 3.15% ** PDE 45.14% 53.13% 1.74% QUICKARALL ECF 71.62% 18.47% 9.90% --- PDE 61.46% 28.82% 9.72% --- *** highly significant, ** significant difference with QUICKARALL
  • 14. TAKE-HOME MESSAGE  Only 12.2% of search queries from the developers are relevant for code change tasks, i.e., vocabulary mismatch is a great concern.  Automatic query reformulation is essential.  Relevance feedback from project source might always not be sufficient.  Queries can be reformulated effectively using Stack Overflow information.  QUICKAR combines vocabulary from project source and Stack Overflow questions.  Empirical evaluation and validation demonstrate potential for our technique. 14
  • 15. THANK YOU! QUESTIONS? 15 Masud Rahman (masud.rahman@usask.ca) QUICKAR: http://www.usask.ca/~masud.rahman/quickar

Notes de l'éditeur

  1. Hello everyone. My name is Mohammad Masudur Rahman. I am a PhD student from University of Saskatchewan, Canada. I work with Dr. Chanchal Roy. Today, I am going to talk on query reformulation for concept location where we used crowdsourced knowledge.
  2. Concept location is a systematic process for mapping between items. It maps concepts from a natural language text to a software engineering artifact. For example, if the natural language item is a software change request, then the software engineering artifact will be the source code. During software maintenance such as feature implementation or bug fixation, such mapping is frequently done by the developers. We provide automatic support in such mapping task.
  3. Once a software user submits a change request (i.e., could be a feature or a bug) to the issue database, a developer is assigned to that change task. Now, what a developer does? She selects some initial keywords to find the relevant source code that has to be changed for resolving the submitted issue. If the search does not work with that query, she reformulates that query. Now existing studies have shown that developers face difficulties in selecting appropriate search terms. In fact, only 12.2% of the search terms were relevant according to an existing study. This is mostly because of the vocabulary mismatch problem. That is, the concept expressed in change request is also expressed in the source but using different vocabulary. Therefore, the reformulation is always challenging, we tackle that research problem in this on-going work.
  4. Now, there are several work that use relevant feedback for query reformulation. One limitation of these work is collecting relevance feedback from the developers is time-consuming and always not possible. Haiduc et al. focus on query quality based reformulation where they coin a term called query difficulty. That is, they determine quality of a query using linguistic and statistical analysis. There is another group of studies that consider context of a query term in the source code. However, this is also limited since it depends on documentation quality of the code. If the code does not contain enough comments, these techniques will not work. Our work is basically a combination of relevance feedback and query context. More importantly, we add another data source—Stack Overflow--to overcome the limitations of both group of techniques.
  5. In Stack Overflow, there are many questions that are duplicate or deal with similar type of issues. Developers often volunteer in marking such duplicate questions. For example, these are duplicate or very closely related questions, and all they focus on memory leak problem. Now, if we consider three terms– Create, Cause and Track, we will get these adjacent word list. Now, as the proverb says—”A person is known by the friend he/she keeps” We apply that concept/idea here. That is, we determine semantic similarity or relatedness between any two terms by comparing their adjacent word list. For example, Create, Cause and Track all share some adjacent word list collected from those questions. That means these words are semantically related to one another. Since we are interested in query reformulation, we can apply such semantically similar/related terms for query reformulation. This can help us beat the vocabulary mismatch problem.
  6. So, we propose our query reformulation technique –QUICKAR– that uses Stack Overflow questions for query reformulation. It has two major steps Construction of adjacent word list. Reformulation of an initial query.
  7. Construction of adjacency list database. We select 500K questions from Stack Overflow, and collect their titles. We perform natural language preprocessing, and consider each title as an ordered list of tokens. Then we consider a window size of two, and capture adjacent word list from each question for each of the terms. Finally, we get a database like this where each term has an adjacent word list from Stack Overflow. We use this database for determining semantic similarity between any 2 terms.
  8. In the query reformulation task, we collect Top 5 results returned by the initial query. Then, we collect the source code tokens from those top results, and consider them as the candidates for reformulation. Now, all the existing studies so far apply different strategies to extract appropriate candidates such as TF-IDF, Dice co-efficient from such tokens. However, they are subject to vocabulary mismatch issue. What we did? We determine semantic similarity between initial query and those candidates using the adjacent list developed from Stack Overflow questions. We select the most semantically similar/related code tokens as the appropriate candidates. We also collect most frequently co-occurring word from Stack Overflow questions that co-occur with keywords from initial query. Once reformulation candidates are selected from both source code and Stack Overflow, we combine them selectively, Reformulate the initial query.
  9. Now lets take a look at the example. Suppose this is the raw query, and this is the preprocessed version. This query returns the results at 65th position which is not a very good thing. Now, we first perform some reduction on the query and use that to collect reformulation candidates from both source code and Stack Overflow Now, in the combination, we applied some ad-hoc strategy. We count the NOMINAL terms among the candidates. For example, in this case, the candidates from Stack Overflow tokens are selected for query expansion. So, we append those candidates to the reduced version of query, and this expanded query returns the first result at the Top position.
  10. In order to test our hypothesis, we conducted our preliminary experiments with 510 queries From two subject systems. In the experiment, we compared our reformulated query with the baseline query in terms of their returned results. If our query returns result in a better position than baseline, then the reformulation is useful. We also compare with a baseline technique called Rocchio’s expansion.
  11. These are the the findings from our preliminary experiments. We first determine the effect of preprocessing on the baseline queries which is not much. Then we see the performance by the reformulation candidates from source code tokens and Stack Overflow tokens. Both of them provide close to 50% improvement, but also worsen equal amount of queries. We also notice an interesting effect of query reduction phase which improves a significant amount of queries. However, when we combine both query reduction and query expansion using candidate tokens from source code and Stack Overflow, the performance improves pretty much. At this point, the performance improvement might not be significant, this is what interested us in this approach.
  12. We see that a major improvement cases overlap among three aspects of our proposed reformulation strategies. However, there exist a significant number of cases where the improvement is not possible if all 3 aspects are not considered properly. For example, about 30% of the improvement is done by either any of three variants of QUICKAR. So, we combine all three aspects for the maximized performance.
  13. We compare with baseline technique, and found that our performance is significantly higher. Even when we compare with a equivalent variant of our technique with baseline, the performance is still higher. This is because, the existing techniques cannot survive the vocabulary mismatch problem. But, our technique provides a way out of that by computing semantic similarity using adjacency list.
  14. So, these are the take-home messages. Developers often face difficulties in choosing appropriate query terms during concept location. So, we provide automatic support in their query reformulation. In our work, we apply semantic similarity rather than lexical similarity to suggest query reformulation, And we use Stack Overflow questions for that purpose. Our experiments findings also show that the reformulated queries are actually improving the baseline queries, And also performing better than a baseline technique. This is an on-going work, and we are still working on with larger dataset.
  15. Thanks for your time. Now, I am ready to take your questions.