[Spambayes] splitndirs bug [need help]

Stephen Anderson stephena@hiwaay.net
Fri, 4 Oct 2002 01:31:42 -0500 (CDT)


Hi,

I'm trying to use splitndirs to split collected spam mbox's into maildir 
format for testing.  I've got exactly 3 hours of Python experience and I'm 
running into a wall.

Splitndirs is incorreclt capturing and munging the "From " line of the 
"next" message at the end of the preceding message.  This oddity seems to 
be happening because of a malfunction of a .read(length) call.

In mailbox.py on line 53, part of the _Subfile.read(length) function, a 
call exists to "self.fp.read(length).  Now length is defined from 
self.stop - self.pos.  Before the call self.pos is 7058L.  Also self.stop 
is 10987L.  Consequently, length is 2788L.

Now if I understand this right, we should read 2788 bytes.  But, after the 
call, self.pos is 11101L.  This represents an overread of 114 bytes.  This 
also happens to be the length of the "From " line that we were supposed to 
stop in front of.  And, said "From " line is part of the read data.

I traced it down to this point, but I can't seem to find the definition of 
the self.fp.read(lenght) function.  I suspects it's OS specific, but I 
can't find it in the Python libraries.

FYI, I am running WinXP Pro.  Can somebody please help me out; I've hit my 
Python newbie limit.  Thanks!


                                       Stephen Anderson
                                     <stephena@HiWAAY.net>