About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map
Our Work

Follow Us:

Visit MITRE on Facebook
Visit MITRE on Twitter
Visit MITRE on Linkedin
Visit MITRE on YouTube
View MITRE's RSS Feeds
View MITRE's Mobile Apps
Home > Our Work > Technical Papers >

Optimizing OCR Accuracy for Bi-tonal, Noisy Scans of Degraded Arabic Documents

February 2005

Paul Herceg, The MITRE Corporation
Ben Huyck, The MITRE Corporation
Chris Johnson, The MITRE Corporation
Amlan Kundu, The MITRE Corporation
Linda Van Guilder, The MITRE Corporation

ABSTRACT

Acquiring foreign language from degraded hardcopy documents is of interest to military and border control applications. Bi-tonal image scans are desirable because file size is small. However, the nature of hardcopy degradations and the scanner or image enhancement software capabilities used directly affect the quality of the captured image and the extent of language acquisition. We applied a collection of manual treatments to hardcopy Arabic documents to develop a corpus of bi-tonal images. We then used this corpus in an exploratory study to derive conclusions about how bi-tonal images could be enhanced. This paper discusses the manually degraded Arabic document corpus, the image enhancement study, and the significant optical character recognition (OCR) improvements obtained with simple scanner driver adjustments.

View/Download Document

Publication

Copyright © 2005 Society of Photo-Optical Instrumentation Engineers. This paper was published in the Proceedings of the International Society for Optical Engineering (SPIE) Vol. 5817, Visual Information Processing XIV, May 2005, pp. 179-187, and is made available as an electronic reprint with permission of SPIE. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.

Additional Search Keywords

Paul Herceg, Amlan Kundu, Chris Johnson, Ben Huyck, Linda Van Guilder, SPIE, optimizing, OCR, accuracy, bi-tonal, noisy, scans, degraded, Arabic documents, OCR, Arabic, image enhancement, foreign language, degraded documents

 

Page last updated: July 5, 2005   |   Top of page

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us