[Image-SIG] Filtering out all but black pixels for OCR

jcupitt at gmail.com jcupitt at gmail.com
Wed Jul 2 13:10:12 CEST 2008


2008/6/29 Mike Meisner <mikem at blazenetme.net>:
> I'd like to use PIL to prep an image file to improve OCR quality.
>
> Specifically, I need to filter out all but black pixels from the image
> (i.e., convert all non-black pixels to white while retaining the black
> pixels).

I realise you asked for PIL, but vips (another image processing
library with a Python interface) can do this rather easily:

-------
from vipsCC import *

try:
        a = VImage.VImage ("some/image/file.format")
        b = a.notequal ([0,0,0])
        b.write ("some/other/image/file.whatever")
except VError.VError, e:
        e.perror (sys.argv[0])
-------

vips uses 0/255 to represent false/true, so the result of the notequal
() method is an image which has 255 for all pixels which are not equal
to [0,0,0], and zero for all pixels which are equal to [0,0,0]. This
is assuming a 3-band image, of course.

See:

  http://www.vips.ecs.soton.ac.uk/

The vips Python interface does not currently work on windows, might be
a problem I guess.

John


More information about the Image-SIG mailing list