CAREER: Document Image Degradation Analysis

  • Barney Smith, Elisa H. (PI)

Project: Research

Project Details

Description

Abstract

0238285

CAREER: Document Image Degradation Analysis

Elisa H. Barney Smith

Boise State U

Optical Character Recognition (OCR) is used to convert documents from images to text form. This allows the documents on the WWW, in company archives or for government intelligence gathering to be searched for

content. The documents that are heavily degraded by printing, scanning, photocopying and faxing may be easily readable to humans, but the recognition accuracies are very low requiring human intervention.

Understanding the image degradations introduced by printing and scanning can lead to improvements in OCR. Instead of training the computer to recognize characters by providing it with a large variety of examples of

characters under different degradation situations we will develop a more effective method based on understanding the degradation and being able to estimate the degradation characteristics for each document.

To improve the performance of OCR, the PI proposes to model the nonlinear systems of printing, scanning, photocopying and FAXing. A calibrated model can predict how a document will look after being subjected to these processes and can be used to develop products that degrade text images less. Once statistically validated, researchers will have the confidence to use these models. Estimation of the parameters to these models from a short character string allows continuous calibration to account for spatially-variant systems. These models and parameters will be used to partition the training set based on modeled degradations and match the appropriate partition to the test data at hand to improve OCR accuracy. This technical side will be combined with an educational component to develop an introduction to engineering course for education majors so they may understand, appreciate and educate all K-12 students about how math and science are used in engineering.

StatusFinished
Effective start/end date1/07/0331/12/10

Funding

  • National Science Foundation: $429,920.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.