Configuring topic models for software engineering tasks in TraceLab

Bogdan Dit, Annibale Panichella, Evan Moritz, Rocco Oliveto, Massimilano Di Penta, Denys Poshyvanyk, Andrea De Lucia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.

Original languageEnglish
Title of host publication2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE 2013 - Proceedings
PublisherIEEE Computer Society
Pages105-109
Number of pages5
ISBN (Print)9781479904952
DOIs
StatePublished - 2013
Event2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE 2013 - San Francisco, CA, United States
Duration: 19 May 201319 May 2013

Publication series

Name2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE 2013 - Proceedings

Conference

Conference2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE 2013
Country/TerritoryUnited States
CitySan Francisco, CA
Period19/05/1319/05/13

Keywords

  • Configurable
  • Experiments
  • Genetic algorithm
  • LDA
  • Traceability
  • TraceLab

Fingerprint

Dive into the research topics of 'Configuring topic models for software engineering tasks in TraceLab'. Together they form a unique fingerprint.

Cite this