Ricoh MP1350 manual Scan Data Capture Accuracy

Page 41

Scan

Ricoh Aficio MP1350

Scan Data Capture Accuracy

One of the fastest growing needs for high- speed scanning is the conversion of legacy hard copy documents into an electronic format for better information sharing, reduced

storage space, and easier search and data retrieval.

A scan converts a page into an image, which is not very manageable. Most companies use optical character recognition (OCR) software to convert the images into editable text, which can then be searched, changed, or incorporated into a new document as required.

The OCR engine recognizes individual images on the page, converting them into letters, numbers, and other symbols. The OCR engine then runs complex analysis on the text in conjunction with spell checkers, technical dictionaries, and other data sources before offering up its best conversion into electronic format.

This stage can be very time-consuming, especially if the quality of the scanned data is poor. This leads to character recognition errors.

To look into this important workflow issue, BERTL ran a series of standard test patterns with multiple font types, sizes, and colors capturing the data at various resolutions using both text and text/photo settings. Text is the default setting for most OCR work due to its 2-bit format, which tends to produce the best text reproduction.

However, as more documents incorporate images and color elements, text/photo, which operates in 8-bit and reproduces gray shades for better reproduction of images and colored text elements, is also being used.

After scanning each page of its test originals, BERTL analysts then ran the scanned files through ABBYY FineReader 8.0, in default configuration. The impact of the accuracy of the scanning process at the various resolutions and settings is reflected in the number of manual confirmations that the OCR application demands before the document is deemed clean and ready to use.

The higher the human intervention rate, the higher the cost of carrying out the action. As expected, the greatest difficulty in OCR recognition was found on the smallest 4 point text sections of the test documents.

Above is a portion of BERTL’s OCR test chart scanned at 200 dpi (top), 300 dpi (middle), and 600 dpi (bottom) in text format and saved as a PDF file. The image has been zoomed to 400 percent in Adobe Acrobat and screen- captured for display.

The top line is 4 point, the middle line is 6 point, and the bottom line 8 point.

The choice of OCR application will also have a dramatic effect on the level of human intervention that is required after the initial scanning has taken place. For that reason, we have standardized on ABBYY, a well-respected leading OCR software developer.

Our tests are run using the latest level of ABBYY’s FineReader 8.0 software in Default modes. Through fine tuning of the rich feature set in ABBYY, an additional portion of the manual intervention could be removed. However, to maintain benchmark comparison procedures, default settings were selected.

Copyright © 2006 MCA Internet, LLC dba BERTL.

13 November 2006

All Rights Reserved. The license under which this document is made available and applicable law prohibit any reproduction or further transmission of any portion of this document. This document may only be viewed electronically through the www.BERTL.com Web site and may not be stored in electronic or hard copy format. Any reproduction of trademarks is strictly prohibited. BERTL accepts no responsibility for any inaccuracies or omissions contained in this document.

Page 41

Image 41
Contents Ricoh Aficio MP1350 What’s Inside Accessibility Introduction Device Features SummaryPaper Handling Paper Input What WE Liked Reloading Paper SuppliesPaper Handling Paper Output/Finishing Paper Handling Paper Output/Finishing Toner Replacement Routine MaintenanceMaintenance Features Summary Clearing Paper JamsPaper Jam Removal Toner Replacement Process Device Management AdministratorsGeneral Office Users Ricoh’s Device Management UtilitiesWhat WE Liked Job Queue Reporting Status MonitoringSecurity Settings Cost Control Reporting Address Book ManagementScan Templates Job Log Management Email Notification AlertsControlling Access Safeguarding DataSecurity Features Summary SecurityWhat WE Liked User Accessibility to Device Controls AccessibilityUser Accessibility for Paper Refilling User Accessibility for Paper Jam RemovalCopy Copy Image Quality Copyright 2006 MCA Internet, LLC dba Bertl November Are All Documents the Same? Print On DemandDo All Devices Offer the Same Capabilities? FutureDocument Server from Control Panel Document Server from Printer Driver Document Server from Web Browser What WE Liked Print Bandwidth Comparison Print ProductivityBatch Printing Capabilities Ricoh Batch Printing SolutionsPCL6 Print Driver Functionality Rpcs Print Driver Functionality Copyright 2006 MCA Internet, LLC dba Bertl November Image Quality What WE Liked Address Book Integration ScanDestinations Security Integration with Third-Party ApplicationsScan to Email Scan to Email Cont’d Scan Data Capture Accuracy Scan Results Scan Data Capture ProductivityBatch/Job Build Scanning Observations Mixed Media and Batch ScanningBatch and Mixed Media Scanning Capabilities Original Handling Capabilities There is a Perfect WeightCoated Surfaces Help Printing But Hinder Scanning Damaged Originals Can Create RejectionsColor Dropout Productivity What Is Color Dropout?Who Uses Color Dropout? What WE Liked Summing Up About Bertl