[Spambayes] Windows compatibility - OCR [was: Unwanted stock solicitations]
Vibe Grevsen
grevsen at gmail.com
Sat Nov 4 00:17:59 CET 2006
Hi friends,
OCR code's now been tweaked and tested to work in both WinXP and Win9x.
This should work in unix as well.
Here is a summary:
1. Put ocrad 0.16 in the path
2. Change the following in ImageStripper.py
ocr = os.popen("ocrad -s %s -c %s -x %s < %s 2>ocrerr.txt" %
(scale, charset, orf, pnmfile))
into this
ocr_cmd = ur'ocrad -s %s -c %s "%s"' % (scale, charset, pnmfile)
# os.popen3() returns [stdin, stdout, stderr]
ocr = os.popen3( ocr_cmd )[1]
3. Change this
if os.path.exists(program) and is_executable(program):
into this
if os.path.exists(program + ".exe") or ( os.path.exists(program) and is_executable(program) ):
Because of the way the instruction is interpreted it does not produce fatal errors even if the file is not found.
4. Change this
for line in open(orf):
if line.startswith("lines"):
nlines = int(line.split()[1])
if nlines:
ctokens.add("image-text-lines:%d" %
int(log2(nlines)))
into this
nlines = ctext.count('\n')
if nlines:
ctokens.add("image-text-lines:%d" %
nlines )
5. Finally I sugest you change the default scale from 1 to 2 like in this line
scale = options["Tokenizer", "ocrad_scale"] or 2
Compile and enjoy.
Happy coding :)
Vibe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20061104/8fd0082e/attachment.html
More information about the SpamBayes
mailing list