Poor man's OCR: need performance improvement tips

Sat Sep 24 14:59:37 EDT 2005

On 24 Sep 2005, at 19:14, qvx wrote:

> Hi all,

<snip>

>
>
> 4. Process each line: compare pixels of each letter of alphabet with
> corresponding pixels in line of input picture. This consists of loops
> comparing pixel by pixel. This is my performance bottleneck.
>
> I'm using PIL for initial image processing. But then I use plain  
> Python
> loops for pixel matrix comparision. One nice optimization was to call
> PIL.Image.getdata() at the begining and then use data[y*w+x]  
> instead of
> PIL.Image.getpixel(xy). I would like to compare each character raster
> with corresponding image pixels in a "single operation" and avoid
> (Python) loops.
>
> Oh, one more thing. Letter raster matrices have varying width and
> constant height (due to proportional width font which is used). This
> compare function should signal if there is a single different pixel.
>
> Any library that can do that?
>
>
> Here is how I expected to solve this problem in C++. Each line of text
> (and letter) has height below 16 pixels. It can be artificially made
> into 16 pixels. I planned to linearize each letter's matrix by  
> columns.
> Imagine leter with pixel indices numbered like this:
>
>  00 10 20
>  01 11 21
>  02 12 22
>  03 13 23
>  .. .. ..
>  0f 1f 2f
>
> I would convert it into 00 01 02 03 04 05 ... 2e 2f. Since each pixel
> is one bit wide, each column would be 2 octets long. I would do the
> same to the line of text of input picture. Then i would have to  
> compare
> two buffers of length equal to the length of character. After
> successfull match, I would advance "input" stream by that number of
> bytes.

Presumably you don't care about alignment and kerning and other  
things currently.

If you haven't tried Psyco yet, try it.
If you read the image in rotated 90 degrees then the data is  
linearised how you want it already. You could then just pack it into  
an integer and compare that, or look it up in a dictionary even.

e.g.
char = lookup[data[n:n+2]]

where n is the left (or bottom, we rotated in 90 degrees remember?)  
and 2 is me assuming PIL will not encode each pixel as entire byte in  
a 1bpp image. I would of thought that would be pretty quick as long  
as you could get the alignment reliable enough.

I hope this makes some actual sense, I have 0 OCR experience tbh.