Data quality relevance in linguistic analysis: The impact of transcription error on multiple methods of linguistic analysis

Research output: Contribution to conferencePaperpeer-review

3 Scopus citations

Abstract

There is an enormous amount of recorded speech generated daily, and quickly transcribing and analyzing the text of this speech could have tremendous value to organizations and researchers. However, the speech transcription process has historically been laborious, expensive, and slow. Automatic speech recognition (ASR) tools have matured a great deal in the last decade and may be a suitable method to generate large scale, high quality transcriptions. These tools are are fast and economical, but generally produce errors at a much greater rate than human transcribers. It is unknown whether these errors matter when conducting psycholinguistic research. In this study, we will investigate the accuracy of earnings conference call transcripts produced by multiple tools and the impact of that transcription accuracy on the results of subsequent text mining analysis. While prior studies have focused on a single form of text mining, we will conduct three types of text analysis: bag-of-words based classification, lexicon-based classification and sentiment analysis. The results will show whether a different level of transcription quality is required for different types of text mining and the feasibility of using automated transcription services across a range of text mining applications.

Original languageEnglish
StatePublished - 2019
Event25th Americas Conference on Information Systems, AMCIS 2019 - Cancun, Mexico
Duration: 15 Aug 201917 Aug 2019

Conference

Conference25th Americas Conference on Information Systems, AMCIS 2019
Country/TerritoryMexico
CityCancun
Period15/08/1917/08/19

Keywords

  • Automatic speech recognition
  • Bag of words
  • Sentiment analysis
  • Text mining

Fingerprint

Dive into the research topics of 'Data quality relevance in linguistic analysis: The impact of transcription error on multiple methods of linguistic analysis'. Together they form a unique fingerprint.

Cite this