[PYTHON IMAGE-SIG] OCR information

Andrew Kuchling amk@magnet.com
Fri, 21 Mar 1997 18:44:04 -0500 (EST)


David Ascher wrote:
> I don't want to discourage such a worthy endeavor, but I think writing a
> competent OCR package from scratch is hardly worth the effort.  If you can
> steal an established algorithm without too much work (e.g. from NIST),
> then by all means do it.  

	Well, this is also for my own amusement and instruction, and
I'll try to get a few tutorial articles out of it.  I found a copy of
the NIST OCR system at ftp://ftp.cygnus.com/pub/, which seems to aim
at handwriting (and not typeset character) recognition, but it's
fearsome stuff, with code to do dictionary searches, neural
networks...eek.  Without more understanding of the algorithms
involved, using that code is quite unlikely.

	There's another package, xocr, that does something much
simpler.  According to INFO.ENGLISH, there are various heuristics to
guess where the next letter is, and then, to quote:

	WSA-algorithm: (Degree-cut-analysis)
	Every character is zoomed to a fixed size. (here: 16x16 pixel)
	Parallel lines are layerd over the picture. (here: 16 lines)
	Now, all Pixels which are set in the picture and placed on a
	line are counted.  (here: 0..24 Points) After that the lines
	will be turned by a fixed degree-value and again calculated
	like above.  All lines will be turned step by step until 180
	degrees are reached.  We have 128 values calculated.  These
	values are coresponding with a 128 dimensional space.

	Now , all trained characters are points in this space. The
	lowest distance between the character we want to know and all
	of the trained characters will be calculated.  If this
	distance is very small the character will be accepted as
	well-recognized, otherwise the user is consulted if it was
	right detected !

This looks fairly simple, and not out of the reach of PIL and the
Numeric extension, but how well does it work in practice?  Again, I
have no way to tell...  So, any suggestions for good pattern
recognition books?


	Andrew Kuchling
	amk@magnet.com
	http://people.magnet.com/%7Eamk/
Save the Gutenberg Project! http://www.promo.net/pg/nl/pgny_nov96.html

_______________
IMAGE-SIG - SIG on Image Processing with Python

send messages to: image-sig@python.org
administrivia to: image-sig-request@python.org
_______________