TopicXP: Exploring topics in source code using Latent Dirichlet Allocation

Trevor Savage, Bogdan Dit, Malcom Gethers, Denys Poshyvanyk

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

54 Scopus citations

Abstract

Acquiring general understanding of large software systems and components from which they are built can be a time consuming task, but having such an understanding is an important prerequisite to adding features or fixing bugs. In this paper we propose the tool, namely TopicXP, to support developers during such software maintenance tasks by extracting and analyzing unstructured information in source code identifier names and comments using Latent Dirichlet Allocation. TopicXP enables developers to gain an overview of a software system under analysis by extracting and visualizing natural language topics, which generally correspond to concepts or features implemented in software classes. TopicXP is implemented as an open-source Eclipse plug-in, which proposes interactive visualization of topics along with structural dependencies between underlying classes implementing these topics. The paper also presents the results of a preliminary user study aimed at evaluating TopicXP.

Original languageEnglish
Title of host publicationProceedings - 2010 IEEE International Conference on Software Maintenance, ICSM 2010
DOIs
StatePublished - 2010
Event2010 IEEE International Conference on Software Maintenance, ICSM 2010 - Timisoara, Romania
Duration: 12 Sep 201018 Sep 2010

Publication series

NameIEEE International Conference on Software Maintenance, ICSM

Conference

Conference2010 IEEE International Conference on Software Maintenance, ICSM 2010
Country/TerritoryRomania
CityTimisoara
Period12/09/1018/09/10

Fingerprint

Dive into the research topics of 'TopicXP: Exploring topics in source code using Latent Dirichlet Allocation'. Together they form a unique fingerprint.

Cite this