[Spambayes] Eliminating duplicates from mbox file
Skip Montanaro
skip at pobox.com
Sat Mar 8 07:32:52 EST 2003
Tim> Stick some prints in the code. In the _handle_text() method, see
Tim> whether this block is getting executed (it should be):
Tim> if self._mangle_from_:
Tim> payload = fcre.sub('>From ', payload)
Okay, I'll give that a try. The reason I stuck in the replace() call was
that what it told me the number of messages was (len(d), where d is the dict
using md5 checksums as keys) differed from what "egrep '^From ' out" told me
after it had generated the output file (there were four more "^From " lines
than the number of messages in the dict). Once I added the replace() call,
they agreed. Given that, I think there's a bug without inserting prints.
(I had planned to submit a bug report today.)
Skip
More information about the Spambayes
mailing list