Chapter 6
Text does not get recognized properly
Try these solutions if any part of the original document is not converted to text properly during OCR:
◆Look at the original page image and ensure that all text areas are enclosed by text zones. If an area is not enclosed by a zone, it is generally ignored during OCR. See the section on creating and modifying zones, “Working with zones” on page 57.
◆Make sure text zones are identified correctly. Reidentify zone types and contents, if necessary, and perform OCR on the document again. See “Zone types and properties” on page 55.
◆Be sure you do not have an unsuitable template loaded by mistake. If zone borders cut through text, recognition is impaired.
◆Adjust the brightness and contrast sliders in the Scanner panel of the Options dialog box. You may need to experiment with different settings combinations to get the desired results.
◆Check the resolution of the original image. Hover the cursor over a page thumbnail for a popup display. If the resolution is significantly above or below 300 dpi, recognition is likely to suffer.
◆Make sure the correct document languages are selected in the OCR panel of the Options dialog box. Only languages included in the document should be selected.
◆Turn IntelliTrain on and make some proofing corrections. This is most likely to help with stylized fonts or uniformly degraded documents. If IntelliTrain was running, try turning it off – on some types of degraded documents it may not be able to help.
◆Do some manual training, or edit existing training to remove unsuccessful training.
◆If you use True Page as the Text Editor view or for export, recognized text is put into text boxes or frames. Some text may be hidden if a text box is too small. To view the text, place the cursor in the text box and use the arrow keys on your keyboard to scroll to the top, bottom, left, or right of the box.
◆Check the glass, mirrors, and lenses on your scanner for dust, smudges or scratches. Clean if necessary.