[spambayes-dev] Proposed fourth X-Spambayes-Classification header value

T. Alexander Popiel popiel at wolfskeep.com
Fri May 30 20:49:14 EDT 2003


In message:  <16087.60959.324302.344655 at montanaro.dyndns.org>
             Skip Montanaro <skip at pobox.com> writes:
>
>    >> would it be possible to catch the exception and move to the next line
>    >> in the header (and/or payload) for parsing?
>
>    Alex> Alas, we're at the wrong level to do that sort of thing.  To do
>    Alex> that level of granularity properly, we'd have to be in the guts of
>    Alex> the parser... 
>
>This is just a shot in the dark, but would it be possible to modify the
>email parser sufficiently so that it gave more detail about where it was
>when the error condition was detected (e.g., what body line number, header
>and/or MIME part)?  That might allow Spambayes to tweak the message in the
>right spot and retry the parse.

It is my belief that tweaking the message intelligently (as opposed
to just forcing the entire body to be treated as plain text by blowing
away the MIME headers) would require more intelligence than doing the
parsing in the first place.  After all, you'd be permuting the data
to make it parse, which means that you understand all about the
parsing.  If we're going to get that smart, we might as well not use
the email package... which has already been circled around a few times.

Personally, I'd like to see us have a simpler parser which just
understood headers vs. body, and didn't try to decode the individual
headers (for charset, or anything like that).  Ideally, we'd give
this simple parser the message (as a string) and a list of headers
to remove from the message, and it would return a modified message
(again as a string).  We could use this simpler parser both for
blowing away the MIME headers (as alluded to above for dealing with
malformed messages) and for annotating the message with the
classification results (blow away all the classification headers,
then prepend the new ones (properly formatted) to the message).

Of course, that would take about two hours of work, and I'm lucky
to get two consecutive minutes right now...

- Alex



More information about the spambayes-dev mailing list