Poor man's OCR: need performance improvement tips

tvrtko.sokolovski at gmail.com tvrtko.sokolovski at gmail.com
Sun Sep 25 04:10:25 EDT 2005


Imagine a large matrix with dimensions [W,H], and a lots of smaller
matrices with dimensions [p,q1], [p,q1], [p,q2], [p,q1], ... I have to
slide a small window [p,q] horizontally over a larger matrix. After
each slide I have to compare smaller matrices with the data from larger
matrix (as defined by sliding window).

I'm currently trying to use other kinds of optimizations (linearize
data by columns), but the program no longer works, and it is so hard to
debug. But it works very fast :)

Here is an example of linearization by columns that i'm currently using
:

  # setup: convert to 1 bit image
  img = Image.open(file_name)
  img2 = img.point([0]*255 + [1], "1")

  # find ocr lines, and for each do ...

  # extract OCR line
  region = img2.crop((0, ocrline.base_y-13, width, ocrline.base_y+3)) #
h=16
  region.load()

  # clean up upper two lines which should be empty but
  # sometimes contain pixels from other ocr line directly above
  draw = ImageDraw.Draw(region)
  draw.line((0,0,width,0), fill=1)
  draw.line((0,1,width,1), fill=1)

  # transpose data so I get columns as rows
  region = region.transpose(Image.FLIP_LEFT_RIGHT)
  region = region.transpose(Image.ROTATE_90)
  ocrline.data = region.tostring() # packs 8 pixels into 1 octet

I do the same for known letters/codes (alphabet). Then I compare like
this:

  def recognize (self, ocrline, x):
      for code_len in self.code_lengths: # descending order
          code = ocrline.data[x:x+code_len]
          ltr = self.letter_codes.get(code, None)
          if ltr is not None:
              return ltr, code_len # So I can advance x

This currently "works" two orders of magnitude faster.




More information about the Python-list mailing list