[Spambayes] 1.0rc1, modifies Received headers

Tim Peters tim.one at comcast.net
Sun May 30 02:58:09 EDT 2004


[Jem]
> I just started using spambayes -- through the commandline sb_filter.py
> and sb_mboxtrain.py. So far, so good! I haven't used it long enough to
> see how quickly it learns how accurate it becomes.
>
> But I noticed that it modifies Received headers, thusly:

[long header line broken up]

> No information is lost; it's just wrapping after the long host name but
> this seems to me a rather strange thing to modify. Is there any way to
> get spambayes to leave this alone? It has the potential of confusing
> other things that try to parse Received headers.

It shouldn't, unless toy software is used to parse.  RFC 2822 defines the
format of header lines:

    http://www.faqs.org/rfcs/rfc2822.html

and the way the long Received line was broken up conforms to the standard.
The reason Python's email package breaks it is that RFC 2822 says (in part):

   There are two limits that this standard places on the number of
   characters in a line.  Each line of characters MUST be no more than
   998 characters, and SHOULD be no more than 78 characters, excluding
   the CRLF.

They're talking about physical lines there, not logical lines, and (like
most others) a Received header is a single logical line that can span any
number of physical lines.

Whoever produced the long Received line originally wasn't following the
standard's recommendation, and Python's email package repairs that as a
matter of course.

> I would think, best left alone unless there is a particular reason to
> change anything in the original message (as a general policy).

I don't think we're *trying* to change anything.  But the parsing tools we
use do rewrite things, according to the relevant standards' recommendations,
in semantically neutral ways.





More information about the Spambayes mailing list