[Spambayes] Re: Mailbox class in the spambayes project & python2.2.1

Alexander Leidinger Alexander@Leidinger.net
Thu, 26 Sep 2002 20:28:23 +0200


On Thu, 26 Sep 2002 13:54:22 -0400 Greg Ward <gward@python.net> wrote:

> The mbox format sucks.  All tools that parse mbox suck; Python's
> mailbox.py, however, sucks slightly more than most.  formail is a tool
> bundled with procmail; it does a pretty good job of splitting up an
> mbox file.  The best use for it is to convert that mbox to a Maildir. 
> I'll attach my scripts for converting an mbox to a Maildir.

Shouldn't I first fix the From lines (those which need to have a '>' in
front of them)?

> (Oh, Alexander, I think you'll have problems because of the weird 
> "From " lines that you showed in your mail.  You'll need to tweak the
> regex used to parse "From " lines in addtomaildir if you want to use
> it on that mbox file.  Good luck!)

I already have all of the mboxes converted into the standard test setup
layout with splitndirs.py (957750 messages in 1000 directories after a
3h run). At the moment I'm having fun as a spam-hunter...

What I want to do now: fix the original mboxes (~200 unparseable
messages total), split them up into the test layout again, automagically
find those messages which I already identified as spam (either with
hammie.py or by some grep magic) and move them into a spam directory.

Bye,
Alexander.

-- 
                      Loose bits sink chips.

http://www.Leidinger.net                       Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7