Scan
Ricoh Aficio MP1350
Scan Data Capture Accuracy
One of the fastest growing needs for high- speed scanning is the conversion of legacy hard copy documents into an electronic format for better information sharing, reduced
storage space, and easier search and data retrieval.
A scan converts a page into an image, which is not very manageable. Most companies use optical character recognition (OCR) software to convert the images into editable text, which can then be searched, changed, or incorporated into a new document as required.
The OCR engine recognizes individual images on the page, converting them into letters, numbers, and other symbols. The OCR engine then runs complex analysis on the text in conjunction with spell checkers, technical dictionaries, and other data sources before offering up its best conversion into electronic format.
This stage can be very
To look into this important workflow issue, BERTL ran a series of standard test patterns with multiple font types, sizes, and colors capturing the data at various resolutions using both text and text/photo settings. Text is the default setting for most OCR work due to its
However, as more documents incorporate images and color elements, text/photo, which operates in
After scanning each page of its test originals, BERTL analysts then ran the scanned files through ABBYY FineReader 8.0, in default configuration. The impact of the accuracy of the scanning process at the various resolutions and settings is reflected in the number of manual confirmations that the OCR application demands before the document is deemed clean and ready to use.
The higher the human intervention rate, the higher the cost of carrying out the action. As expected, the greatest difficulty in OCR recognition was found on the smallest 4 point text sections of the test documents.
Above is a portion of BERTL’s OCR test chart scanned at 200 dpi (top), 300 dpi (middle), and 600 dpi (bottom) in text format and saved as a PDF file. The image has been zoomed to 400 percent in Adobe Acrobat and screen- captured for display.
The top line is 4 point, the middle line is 6 point, and the bottom line 8 point.
The choice of OCR application will also have a dramatic effect on the level of human intervention that is required after the initial scanning has taken place. For that reason, we have standardized on ABBYY, a
Our tests are run using the latest level of ABBYY’s FineReader 8.0 software in Default modes. Through fine tuning of the rich feature set in ABBYY, an additional portion of the manual intervention could be removed. However, to maintain benchmark comparison procedures, default settings were selected.
Copyright © 2006 MCA Internet, LLC dba BERTL. | 13 November 2006 |
All Rights Reserved. The license under which this document is made available and applicable law prohibit any reproduction or further transmission of any portion of this document. This document may only be viewed electronically through the www.BERTL.com Web site and may not be stored in electronic or hard copy format. Any reproduction of trademarks is strictly prohibited. BERTL accepts no responsibility for any inaccuracies or omissions contained in this document.
Page 41