Use of figurative language, such as metaphors and idioms, is common in our daily-life communications, and it can also be found in Software Engineering (SE) channels, such as comments on GitHub. Automatically interpreting figurative language is a challenging task, even with modern Large Language Models (LLMs), as it often involves subtle nuances. This is particularly true in the SE domain, where figurative language is frequently used to convey technical concepts, often bearing developer affect (e.g., 'spaghetti code). Surprisingly, there is a lack of studies on how figurative language in SE communications impacts the performance of automatic tools that focus on understanding developer communications, e.g., bug prioritization, incivility detection. Furthermore, it is an open question to what extent state-of-the-art LLMs interpret figurative expressions in domain-specific communication such as software engineering. To address this gap, we study the prevalence and impact of figurative language in SE communication channels. This study contributes to understanding the role of figurative language in SE, the potential of LLMs in interpreting them, and its impact on automated SE communication analysis. Our results demonstrate the effectiveness of fine-tuning LLMs with figurative language in SE and its potential impact on automated tasks that involve affect. We found that, among three state-of-the-art LLMs, the best improved fine-tuned versions have an average improvement of 6.66% on a GitHub emotion classification dataset, 7.07% on a GitHub incivility classification dataset, and 3.71% on a Bugzilla bug report prioritization dataset.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Shedding Light on Software Engineering-specific Metaphors and Idioms
1. Shedding Light on Software
Engineering Specific Metaphors and
Idioms
Drexel University Virginia Commonwealth
University
Mia Mohammad Imran, Preetha Chatterjee, Kostadin
Damevski
3. “Cause You Know Sometimes Words
Have Two Meanings”
Debt
Bug
Skeleton
Ticket
Are there new words emerging that are now
used differently?
Cloud Fog Edge
What about “Hallucinations”?
4. Figurative Language in SE
This has crept into unrelated bits of generator code
thanks to frankencoding!
However, there is a lot of unnecessary copy-paste
spaghetti code, uninformative variable names, etc.
which I didn't write myself
Oh wow, that’s even weirder than I thought lol. Quite
the heisenbug
5. Study Design and Goals
● Purpose: To investigate the prevalence, interpretation, and
impact of figurative language in Software Engineering
communications
● We designed 3 RQs
6. Research Questions
● How well can LLMs interpret figurative language in Software
Engineering context?
● Can Software Engineering-specific affective analysis
performance be improved by better insight into figurative
language?
● How does understanding figurative language impact
Software Engineering tasks like bug prioritization?
8. Data Collection
● Sampled 2000 sentences (1000 with potential metaphors
and 1000 with potential idioms) from 9 GitHub popular
repositories
9. Data Annotation
● Verification of Figurative Expressions
● Rephrase sentences:
○ Equivalent Meaning Sentences (EMS): Sentences
reworded to remove figurative language while retaining
meaning
○ Different Meaning Sentences (DMS): Sentences
modified to change the meaning using similar words
10. Data Annotation: Example
Sentence: Otherwise this could give us a nasty bug
Equivalent Meaning Sentences (EMS): Otherwise this could
result in a dangerous error in code
Different Meaning Sentences (DMS): Otherwise, this neglected
garden could infest us with an unpleasant insect
11. Data Annotation
Annotators identified 1661 sentences with Figurative Language
● 752 sentences with Metaphors
● 909 sentences with Idioms
A total of 1741 unique Figurative Expressions marked
● 445 Software Engineering specific
● 1296 General
13. Research Question 1
● RQ: How well can LLMs interpret figurative language in
Software Engineering context??
● Evaluated LLMs: BERT, RoBERTa, ALBERT, CodeBERT
○ BERT, RoBERTa and ALBERT general domain
○ CodeBERT is SE specific LLM
14. RQ1: LLMs' Interpretation Capabilities
● Task: Assess models' abilities to
distinguish between EMS and DMS
● Calculated and compared cosine
similarity between (original
sentence, EMS) and (original
sentence, DMS) pairs
15. RQ1: LLMs' Interpretation Capabilities
● BERT performed best
● CodeBERT suffered most, especially with SE-specific
figurative expressions
Model SE-specific General Overall
BERT 84.51% 87.40% 86.57%
RoBERTa 83.70% 85.21% 84.95%
ALBERT 81.79% 85.80% 85.00%
CodeBERT 77.99% 79.63% 79.11%
16. Research Question 2
● RQ: Can SE-specific affective analysis performance be
improved by better insight into figurative language?
● Objective: Evaluate if improved figurative language
interpretation enhances LLMs' performance in SE affective
analysis
17. RQ2: Affective Analysis Enhancement
Analyzed Tasks:
● Emotion Detection
● Incivility Detection
Evaluated LLMs: BERT, RoBERTa, ALBERT, CodeBERT
● BERT, RoBERTa and ALBERT general domain
● CodeBERT is SE specific LLM
18. RQ2: Affective Analysis Enhancement
● Applied contrastive learning
● LLMs presented with triplets (Original Sentence, EMS, DMS)
● Minimize the distance between anchor (original) and positive
(EMS) samples, maximize distance from negative (DMS)
● Process repeated until the LLMs learn a satisfactory
representation
19. RQ2: Affective Analysis Enhancement
● Post Contrastive Learning: Task-specific fine-tuning applied
for emotion and incivility classification
○ This means two times fine-tuning for each tasks
● Performance Metric: F1-score used to evaluate
20. RQ2: Affective Analysis Enhancement -
Emotion Classification
● 6 classes: Anger, Love, Fear, Joy, Sadness and Surprise
● Dataset [1] Model Average F1-score Improvement
BERT
BERT-FL
0.588
0.627 +6.60%
RoBERTa
RoBERTa-FL
0.593
0.632 +6.66%
ALBERT
ALBERT-FL
0.531
0.550 +3.63%
CodeBERT
CodeBERT-FL
0.561
0.583 +3.90%
[1] M. M. Imran, Y. Jain, P. Chatterjee, and K.
Damevski, “Data augmentation for
improving emotion recognition in software
engineering communication,” in 2022
IEEE/ACM 37th International Conference on
ASE
21. RQ2: Affective Analysis Enhancement -
Incivility Classification
● 2 classes: Civil and Uncivil
● Dataset [1] Model Average F1-score Improvement
BERT
BERT-FL
0.734
0.783 6.67%
RoBERTa
RoBERTa-FL
0.734
0.769 4.76%
ALBERT
ALBERT-FL
0.685
0.713 4.08%
CodeBERT
CodeBERT-FL
0.692
0.741 7.07%
[1] I. Ferreira, B. Adams, and J. Cheng, “How
heated is it? understanding github locked
issues,” in 2022 IEEE/ACM 19th International
Conference on MSR
22. Research Question 3
● RQ: How does understanding figurative language impact SE
tasks like bug prioritization?
● Objective: Evaluating if better figurative language
interpretation boosts SE automation tasks
23. RQ3: Enhancing SE Automation with
Figurative Language
Analyzed Tasks:
● Bug Priority Detection
Evaluated LLMs: BERT, RoBERTa, ALBERT, CodeBERT
● BERT, RoBERTa and ALBERT general domain
● CodeBERT is SE specific LLM
24. RQ3: Enhancing SE Automation with
Figurative Language
● 5 classes: P1, P2, P3, P4, P5
● Dataset [1] Model Average F1-score Improvement
BERT
BERT-FL
0.716
0.730 1.96%
RoBERTa
RoBERTa-FL
0.707
0.724 2.40%
ALBERT
ALBERT-FL
0.683
0.709 3.71%
CodeBERT
CodeBERT-FL
0.714
0.726 1.61%
[1] W.-Y. Wang, C.-H. Wu, and J. He, “Clebpi:
Contrastive learning for bug priority
inference,” Information and Software
Technology, vol. 164, 2023
26. Implications of Research Findings
● Educational Benefits:
○ Glossaries of project-specific figurative language can help
onboard new developers
○ Minimizing obscure jargon enhances understanding and
collaboration
● Cultural Considerations:
○ Consider cultural differences influencing figurative language
interpretation
27. Future Research Directions
● Integrate figurative language into SE tools/models
● Investigate role of figurative language in specific scenarios
(toxicity, bug report, documentation, etc)
● Explore figurative language for data augmentation
● Broaden to other types (similes, hyperbole, personification)
● Extend to other SE platforms (Stack Overflow, Gitter, JIRA)
28. Summary of Contributions
● Annotated Data: 1661 annotated GitHub sentences with
metaphors and idioms
● Open Resources: Annotation guidelines, dataset, and codes
publicly accessible
● Pioneering Research: Among the first to explore the impact
of figurative language in SE
● LLM Model Enhancement: Advanced LLM models refined for
better figurative language understanding in SE context
Questions/Thoughts/Collaboration Ideas to: Mia Mohammad Imran, imranm3@vcu.edu