Poor man's OCR: need performance improvement tips

Sat Sep 24 14:14:37 EDT 2005

Hi all,

I have a performance problem in my app. It is a poor man's version of
OCR app. I started this app as a prototype before implementing it in
C++. But now, after I have a working copy in Python, I don't feel like
doing the job again in C++. A speed improvement of at least 5 times
would be necessary (I have to process several hundreds of thousands of
images).

The application works like this:

1. Load a raster image of alphabet. This image is accompanied with xml
description of that image (positions of each letter etc.). No problems
here. Using this information load raster data (data[][]) for each
letter.

2. Load image which must be recognized and transform it into 1 bit
image (to avoid palette issues). No problems here.

3. Detect lines of text in input picture. No problems here.

4. Process each line: compare pixels of each letter of alphabet with
corresponding pixels in line of input picture. This consists of loops
comparing pixel by pixel. This is my performance bottleneck.

I'm using PIL for initial image processing. But then I use plain Python
loops for pixel matrix comparision. One nice optimization was to call
PIL.Image.getdata() at the begining and then use data[y*w+x] instead of
PIL.Image.getpixel(xy). I would like to compare each character raster
with corresponding image pixels in a "single operation" and avoid
(Python) loops.

Oh, one more thing. Letter raster matrices have varying width and
constant height (due to proportional width font which is used). This
compare function should signal if there is a single different pixel.

Any library that can do that?

Here is how I expected to solve this problem in C++. Each line of text
(and letter) has height below 16 pixels. It can be artificially made
into 16 pixels. I planned to linearize each letter's matrix by columns.
Imagine leter with pixel indices numbered like this:

 00 10 20
 01 11 21
 02 12 22
 03 13 23
 .. .. ..
 0f 1f 2f

I would convert it into 00 01 02 03 04 05 ... 2e 2f. Since each pixel
is one bit wide, each column would be 2 octets long. I would do the
same to the line of text of input picture. Then i would have to compare
two buffers of length equal to the length of character. After
successfull match, I would advance "input" stream by that number of
bytes.

Thanx
qvx