[Spambayes] Re: Mailbox class in the spambayes project & python2.2.1

Alexander Leidinger Alexander@Leidinger.net
Thu, 26 Sep 2002 19:45:04 +0200


On Thu, 26 Sep 2002 12:55:39 -0400 Tim Peters <tim.one@comcast.net>
wrote:

> [Alexander Leidinger]
> > ...
> > Now I want to clean up some unparseable messages (mostly some
> > missing'>' in front of some "From " lines).
> 
> See the 'cleanarch' script, already in the project.

No, cleanarch damaged the files in the first run. After cleaning them up
(removing the '>' in front of every line which begins with ">From " and
has a date at the end) I now have to find those lines, which need the
'>' now. I already found some in mboxes with upto 90 messages (all of
them had at least one unparseable message) and fixed it.

Here you can see what happens when I use cleanarch (the second one is
filtered with cleanarch):

(43) netchild@ttyp0 % ../../mboxcount.py 20001119.freebsd-stable.txt 20001119.freebsd-stable.txt_ 
20001119.freebsd-stable.txt             529 (+ unparseable: 1)
20001119.freebsd-stable.txt_            368 (+ unparseable: 1)
Total                                   897 (+ unparseable: 2)

And the diff:
--- 20001119.freebsd-stable.txt Thu Sep 26 19:34:36 2002
+++ 20001119.freebsd-stable.txt_        Thu Sep 26 19:35:06 2002
@@ -1,4 +1,4 @@
-From owner-freebsd-stable  Sun Nov 12  0: 6:26 2000
+>From owner-freebsd-stable  Sun Nov 12  0: 6:26 2000
 Delivered-To: freebsd-stable@freebsd.org
 Received: from sol.cc.u-szeged.hu (sol.cc.u-szeged.hu [160.114.8.24])
        by hub.freebsd.org (Postfix) with ESMTP id 720D537B4D7
@@ -124,7 +124,7 @@
 with "unsubscribe freebsd-stable" in the body of the message
 
 
-From owner-freebsd-stable  Sun Nov 12  4: 7:19 2000
+>From owner-freebsd-stable  Sun Nov 12  4: 7:19 2000
 Delivered-To: freebsd-stable@freebsd.org
 Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173])
        by hub.freebsd.org (Postfix) with ESMTP id 929C837B479
@@ -319,7 +319,7 @@
 with "unsubscribe freebsd-stable" in the body of the message
 
 
-From owner-freebsd-stable  Sun Nov 12  8: 7:58 2000
+>From owner-freebsd-stable  Sun Nov 12  8: 7:58 2000
 Delivered-To: freebsd-stable@freebsd.org
 Received: from sfinx.lasting.ro (sfinx.lasting.ro [193.230.239.254])
        by hub.freebsd.org (Postfix) with ESMTP id 129E637B479
@@ -403,7 +403,7 @@

All of those falsely "corrected" messages are beginnings of a new
message.

Bye,
Alexander.

-- 
Failure is not an option. It comes bundled with your Microsoft product.

http://www.Leidinger.net                       Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7