[spambayes-dev] Latest CVS update, Ocrad for Windows

Mark Hammond mhammond at skippinet.com.au
Wed Aug 23 23:52:45 CEST 2006


> Quick clue: I'm not using Outlook or Windows. ;-)

Yep, I know that :)  My mail was sent fairly late, so I didn't explain very
well.

> I don't
> know what to do
> given that Outlook shreds email so completely.  Maybe this
> stuff can only be
> tested on Unix-y machines.  Maybe the image analysis code
> won't even work
> because there's no such thing as an attachment with MIME content-type
> image/*... in Outlook.

I can manage all of that.  What I need to know is in what format your Ham
and Spam directories are.  Currently mine are in plain-text.  A quick look
at the code showed that these were *not* expected to be a dump of a mime
message, but instead a simple "word stream" - which didn't seem to fit with
the binary data inside attachments.  I was guessing they had already been
processed to some degree, but gave up before digging deeper.

> As for actual setup, it's done in what I think is the "usual"
> way.  I start
> with two or more Unix mbox format files (at least one full of
> ham, one full
> of spam).  I then run utilities/splitndirs.py to allocate them to the
> desired number of Data/{Ham,Spam}/SetN directories.  I then
> make a series of
> runs like so:

hrm - so maybe they *are* just the complete dump of the message including
the encoded image data and mime boundaries etc - I'll play a little more and
look inside splitndirs.

Thanks,

Mark



More information about the spambayes-dev mailing list