Data Quality Relevance in Linguistic Analysis: The Impact of Transcription Errors on Multiple Methods of Linguistic Analysis

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Downloads (Pure)

Abstract

There is an enormous amount of recorded speech generated daily, and quickly transcribing and analyzing the text of this speech could have tremendous value to organizations and researchers. However, the speech transcription process has historically been laborious, expensive, and slow. Automatic speech recognition (ASR) tools have matured a great deal in the last decade and may be a suitable method to generate large scale, high quality transcriptions. These tools are are fast and economical, but generally produce errors at a much greater rate than human transcribers. It is unknown whether these errors matter when conducting psycholinguistic research. In this study, we will investigate the accuracy of earnings conference call transcripts produced by multiple tools and the impact of that transcription accuracy on the results of subsequent text mining analysis. While prior studies have focused on a single form of text mining, we will conduct three types of text analysis: bag-of-words based classification, lexicon-based classification and sentiment analysis. The results will show whether a different level of transcription quality is required for different types of text mining and the feasibility of using automated transcription services across a range of text mining applications.

Original languageAmerican English
Title of host publication25th Americas Conference on Information Systems, AMCIS 2019
ISBN (Electronic)978-0-9966831-8-0
StatePublished - 2019
Event25th Americas Conference on Information Systems - Cancún, Mexico
Duration: 1 Jan 2019 → …
Conference number: 25
https://aisel.aisnet.org/amcis2019/

Conference

Conference25th Americas Conference on Information Systems
Abbreviated titleAMCIS 2019
Country/TerritoryMexico
CityCancún
Period1/01/19 → …
Internet address

Fingerprint

Dive into the research topics of 'Data Quality Relevance in Linguistic Analysis: The Impact of Transcription Errors on Multiple Methods of Linguistic Analysis'. Together they form a unique fingerprint.

Cite this