[spambayes-dev] Latest CVS update, Ocrad for Windows
Mark Hammond
mhammond at skippinet.com.au
Wed Aug 23 23:52:45 CEST 2006
> Quick clue: I'm not using Outlook or Windows. ;-)
Yep, I know that :) My mail was sent fairly late, so I didn't explain very
well.
> I don't
> know what to do
> given that Outlook shreds email so completely. Maybe this
> stuff can only be
> tested on Unix-y machines. Maybe the image analysis code
> won't even work
> because there's no such thing as an attachment with MIME content-type
> image/*... in Outlook.
I can manage all of that. What I need to know is in what format your Ham
and Spam directories are. Currently mine are in plain-text. A quick look
at the code showed that these were *not* expected to be a dump of a mime
message, but instead a simple "word stream" - which didn't seem to fit with
the binary data inside attachments. I was guessing they had already been
processed to some degree, but gave up before digging deeper.
> As for actual setup, it's done in what I think is the "usual"
> way. I start
> with two or more Unix mbox format files (at least one full of
> ham, one full
> of spam). I then run utilities/splitndirs.py to allocate them to the
> desired number of Data/{Ham,Spam}/SetN directories. I then
> make a series of
> runs like so:
hrm - so maybe they *are* just the complete dump of the message including
the encoded image data and mime boundaries etc - I'll play a little more and
look inside splitndirs.
Thanks,
Mark
More information about the spambayes-dev
mailing list