[Spambayes] Windows compatibility - OCR [was: Unwanted stock solicitations]

Vibe Grevsen grevsen at gmail.com
Sat Nov 4 00:17:59 CET 2006


Hi friends,

OCR code's now been tweaked and tested to work in both WinXP and Win9x.
This should work in unix as well.

Here is a summary:

1. Put ocrad 0.16 in the path

2. Change the following in ImageStripper.py

                ocr = os.popen("ocrad -s %s -c %s -x %s < %s 2>ocrerr.txt" %
                               (scale, charset, orf, pnmfile))

into this

                ocr_cmd = ur'ocrad -s %s -c %s "%s"' % (scale, charset, pnmfile)

                # os.popen3() returns [stdin, stdout, stderr]
                ocr = os.popen3( ocr_cmd )[1]


3. Change this

        if os.path.exists(program) and is_executable(program):

into this

        if os.path.exists(program + ".exe") or ( os.path.exists(program) and is_executable(program) ):

Because of the way the instruction is interpreted it does not produce fatal errors even if the file is not found.

4. Change this

                for line in open(orf):
                    if line.startswith("lines"):
                        nlines = int(line.split()[1])
                        if nlines:
                            ctokens.add("image-text-lines:%d" %
                                        int(log2(nlines)))


into this

                nlines = ctext.count('\n')
                if nlines:
                    ctokens.add("image-text-lines:%d" %
                                nlines )

5. Finally I sugest you change the default scale from 1 to 2 like in this line

        scale = options["Tokenizer", "ocrad_scale"] or 2



Compile and enjoy.


Happy coding :)

Vibe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20061104/8fd0082e/attachment.html 


More information about the SpamBayes mailing list