Mining domain knowledge: Using functional dependencies to profile data

Derek Legenzoff, Teagen Nabity

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Poor data quality is one of the primary issues facing big data projects. Cleaning data and improving quality can be expensive and time-intensive. In data warehouse projects, data cleaning is estimated to account for 30% to 80% of the project's development time and budget. Data quality mining is one method used to identify errors that has become increasingly popular in the past 20 years. Our research-in-progress aims to identify multi-field errors via the mining of functional dependencies. Existing research on data quality mining and functional dependencies has focused on improving algorithms to identify a higher percentage of complex errors. The proposed process strives to introduce an efficient method for expediting error identification and increasing a user's domain knowledge in order to reduce the costs associated with cleaning; the process will also include an assessment of when further cleaning is unlikely to be cost effective.

Original languageEnglish
Title of host publication2016 International Conference on Information Systems, ICIS 2016
PublisherAssociation for Information Systems
ISBN (Electronic)9780996683135
StatePublished - 2016
Event2016 International Conference on Information Systems, ICIS 2016 - Dublin, Ireland
Duration: 11 Dec 201614 Dec 2016

Publication series

Name2016 International Conference on Information Systems, ICIS 2016

Conference

Conference2016 International Conference on Information Systems, ICIS 2016
Country/TerritoryIreland
CityDublin
Period11/12/1614/12/16

Keywords

  • Data cleaning
  • Data cleaning process
  • Data mining
  • Data quality
  • Domain knowledge
  • Error identification
  • Functional dependency

Fingerprint

Dive into the research topics of 'Mining domain knowledge: Using functional dependencies to profile data'. Together they form a unique fingerprint.

Cite this