[Spambayes] Re: Mailbox class in the spambayes project &
python2.2.1
Alexander Leidinger
Alexander@Leidinger.net
Thu, 26 Sep 2002 20:28:23 +0200
On Thu, 26 Sep 2002 13:54:22 -0400 Greg Ward <gward@python.net> wrote:
> The mbox format sucks. All tools that parse mbox suck; Python's
> mailbox.py, however, sucks slightly more than most. formail is a tool
> bundled with procmail; it does a pretty good job of splitting up an
> mbox file. The best use for it is to convert that mbox to a Maildir.
> I'll attach my scripts for converting an mbox to a Maildir.
Shouldn't I first fix the From lines (those which need to have a '>' in
front of them)?
> (Oh, Alexander, I think you'll have problems because of the weird
> "From " lines that you showed in your mail. You'll need to tweak the
> regex used to parse "From " lines in addtomaildir if you want to use
> it on that mbox file. Good luck!)
I already have all of the mboxes converted into the standard test setup
layout with splitndirs.py (957750 messages in 1000 directories after a
3h run). At the moment I'm having fun as a spam-hunter...
What I want to do now: fix the original mboxes (~200 unparseable
messages total), split them up into the test layout again, automagically
find those messages which I already identified as spam (either with
hammie.py or by some grep magic) and move them into a spam directory.
Bye,
Alexander.
--
Loose bits sink chips.
http://www.Leidinger.net Alexander @ Leidinger.net
GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7