Project Details
Description
Abstract
0238285
CAREER: Document Image Degradation Analysis
Elisa H. Barney Smith
Boise State U
Optical Character Recognition (OCR) is used to convert documents from images to text form. This allows the documents on the WWW, in company archives or for government intelligence gathering to be searched for
content. The documents that are heavily degraded by printing, scanning, photocopying and faxing may be easily readable to humans, but the recognition accuracies are very low requiring human intervention.
Understanding the image degradations introduced by printing and scanning can lead to improvements in OCR. Instead of training the computer to recognize characters by providing it with a large variety of examples of
characters under different degradation situations we will develop a more effective method based on understanding the degradation and being able to estimate the degradation characteristics for each document.
To improve the performance of OCR, the PI proposes to model the nonlinear systems of printing, scanning, photocopying and FAXing. A calibrated model can predict how a document will look after being subjected to these processes and can be used to develop products that degrade text images less. Once statistically validated, researchers will have the confidence to use these models. Estimation of the parameters to these models from a short character string allows continuous calibration to account for spatially-variant systems. These models and parameters will be used to partition the training set based on modeled degradations and match the appropriate partition to the test data at hand to improve OCR accuracy. This technical side will be combined with an educational component to develop an introduction to engineering course for education majors so they may understand, appreciate and educate all K-12 students about how math and science are used in engineering.
Status | Finished |
---|---|
Effective start/end date | 1/07/03 → 31/12/10 |
Funding
- National Science Foundation: $429,920.00