Chapter 4
Training
Training is the process of changing the OCR solutions assigned to character shapes in the image. It is useful for uniformly degraded documents or when an unusual typeface is used throughout a document. Training will be less useful for texts with random distortions. Here is an example, based on the letter “g”, which can be printed in different ways:
The first two examples do not need training, because both shapes are normal for the letter “g” and the program can handle them. The third example could benefit from training because the shape of “g” is unusual, and all instances of “g” in the text are likely to look like this. The fourth example is not good for training, because the first “g” is poorly printed, and this shape is unlikely to appear again in the document.
You can use training to improve recognition of special symbols such as @,
®and © or to recognize supported accented letters more reliably. The purpose of training is not to teach the program to read characters from
OmniPage Pro 12 offers two types of training: manual training and automatic training (IntelliTrain). Data coming from both types of training are combined and available for saving to a training file.
When you leave a page on which training data was generated, you will be asked how to apply it to other existing pages in the document.
Manual training
To do manual training, place the insertion point in front of the character you want to train, or select a group of characters (up to one word) and choose Train Character... from the Tools menu or the shortcut menu. You will see an enlarged view of the character(s) to be trained, along with the current OCR solution. Change this to the desired solution and click OK. The program takes this training and examines the rest of the page. If it