TY - GEN
T1 - Configuring topic models for software engineering tasks in TraceLab
AU - Dit, Bogdan
AU - Panichella, Annibale
AU - Moritz, Evan
AU - Oliveto, Rocco
AU - Di Penta, Massimilano
AU - Poshyvanyk, Denys
AU - De Lucia, Andrea
PY - 2013
Y1 - 2013
N2 - A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.
AB - A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.
KW - Configurable
KW - Experiments
KW - Genetic algorithm
KW - LDA
KW - Traceability
KW - TraceLab
UR - http://www.scopus.com/inward/record.url?scp=84888593644&partnerID=8YFLogxK
U2 - 10.1109/TEFSE.2013.6620164
DO - 10.1109/TEFSE.2013.6620164
M3 - Conference contribution
AN - SCOPUS:84888593644
SN - 9781479904952
T3 - 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE 2013 - Proceedings
SP - 105
EP - 109
BT - 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE 2013 - Proceedings
PB - IEEE Computer Society
T2 - 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE 2013
Y2 - 19 May 2013 through 19 May 2013
ER -