TY - GEN
T1 - Parameterizing and assembling IR-based solutions for SE tasks using genetic algorithms
AU - Panichella, Annibale
AU - Dit, Bogdan
AU - Oliveto, Rocco
AU - Di Penta, Massimiliano
AU - Poshyvanyk, Denys
AU - de Lucia, Andrea
N1 - Publisher Copyright:
© 2016 IEEE
PY - 2016/5/20
Y1 - 2016/5/20
N2 - Information Retrieval (IR) approaches are nowadays used to support various software engineering tasks, such as feature location, traceability link recovery, clone detection, or refactoring. However, previous studies showed that inadequate instantiation of an IR technique and underlying process could significantly affect the performance of such approaches in terms of precision and recall. This paper proposes the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process for software engineering tasks. The approach (named GA-IR) determines the (near) optimal solution to be used for each stage of the IR process, i.e., term extraction, stop word removal, stemming, indexing and an IR algebraic method calibration. We applied GA-IR on two different software engineering tasks, namely traceability link recovery and identification of duplicate bug reports. The results of the study indicate that GA-IR outperforms approaches previously published in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised and combinatorial approach.
AB - Information Retrieval (IR) approaches are nowadays used to support various software engineering tasks, such as feature location, traceability link recovery, clone detection, or refactoring. However, previous studies showed that inadequate instantiation of an IR technique and underlying process could significantly affect the performance of such approaches in terms of precision and recall. This paper proposes the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process for software engineering tasks. The approach (named GA-IR) determines the (near) optimal solution to be used for each stage of the IR process, i.e., term extraction, stop word removal, stemming, indexing and an IR algebraic method calibration. We applied GA-IR on two different software engineering tasks, namely traceability link recovery and identification of duplicate bug reports. The results of the study indicate that GA-IR outperforms approaches previously published in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised and combinatorial approach.
KW - Information retrieval
KW - Parametrization
KW - Search-based software engineering
KW - Text-based software engineering
UR - http://www.scopus.com/inward/record.url?scp=85095246860&partnerID=8YFLogxK
U2 - 10.1109/SANER.2016.97
DO - 10.1109/SANER.2016.97
M3 - Conference contribution
AN - SCOPUS:85095246860
T3 - 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
SP - 314
EP - 325
BT - 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
Y2 - 14 March 2016 through 18 March 2016
ER -