[Image-SIG] Image-SIG Digest, Vol 63, Issue 1

Wed Jul 2 18:44:15 CEST 2008

On 7/2/08, image-sig-request at python.org (image-sig-request at python.org) wrote:

>I'd like to use PIL to prep an image file to improve OCR quality.
>
>Specifically, I need to filter out all but black pixels from the image (i.e., 
>convert all non-black pixels to white while retaining the black pixels).
>
>Can someone please direct me to the appropriate PIL function/method to 
>accomplish this along with a brief description of the correct arguments to use?

I don't  have the arguments to use, but the process is a bit more involved to enhance a bi-level image obtained through grayscale in order to get the best results (IMO).

The best results I have seen are by applying a moderately strong 'S' curve with sharp shoulders, then applying two passes of unsharp masking, one with a large aperture and a subsequent with a lower-intensity and smaller aperture, then finally maping the to bit-level required by OCR (usually a threshold into a bitmap).

Another trick, if you have the time is to scan at a higher resolution (in integer increments i.e. 2x, 3x, 4x so interpolation doesn't interfere), process the image as described then reduce the resolution to the optimum OCR res. I have to admit, this is from a while ago, I'm not sure what the current state of affairs is with OCR software (been 10 years, if a day, since I used any).

Scott