[Spambayes] Re: Outlook plugin plus Exchange

Anthony Baxter anthony@interlink.com.au
Tue Nov 12 07:37:32 2002


>>> Tim Peters wrote
> [Mark Hammond]
> > I believe the email package should give some consideration to the
> > real world here.
> 
> It tries to, starting in 2.2.2:
> 
>     http://www.python.org/doc/current/lib/node383.html
> 
> The Parser class defaults to non-strict now, but as the docs say
> 
>     this doesn't mean MessageParseErrors are never raised; some ill-
>     formatted messages just can't be parsed
> 
> I'm sure Barry would be willing to entertain this specific case as a bug
> report.  In theory, he reads this list, so should be shamed enough to do
> that himself <wink>.

Yah. The non-strict mode was initially my fault, because I wanted to
be able to parse bad MIME. In this case, you're hitting a broken MIME
subsection. That 'of_message' is nothing like a header at all - if
the broken MIME subsection is supposed to be parsed, there should be a
newline between the boundary and the subsection.

The section of code in question has this comment:

    # Normal, non-continuation header.  BAW: this should check to make
    # sure it's a legal header, e.g. doesn't contain spaces.  Also, we
    # should expose the header matching algorithm in the API, and
    # allow for a non-strict parsing mode (that ignores the line
    # instead of raising the exception).

Here's an (untested :) patch. Depending on how you want to handle these 
sorts of errors, uncomment either the 'break' or the 'continue' line.


--- Parser.py   23 Sep 2002 13:18:55 -0000      1.1.1.1
+++ Parser.py   12 Nov 2002 07:34:40 -0000
@@ -98,9 +98,15 @@
                 if self._strict:
                     raise Errors.HeaderParseError(
                         "Not a header, not a continuation: ``%s''"%line)
-                elif lineno == 1 and line.startswith('--'):
-                    # allow through duplicate boundary tags.
-                    continue
+                elif lineno == 1:
+                    if line.startswith('--'):
+                        # allow through duplicate boundary tags.
+                        continue
+                    else: 
+                        # hack hack hack. We saw a non header. Either:
+                        #continue # to ignore it silently.
+                        # or
+                        break # to treat the rest of the headers as body
                 else:
                     raise Errors.HeaderParseError(
                         "Not a header, not a continuation: ``%s''"%line)

I'm not comfortable that this should go into the core distribution of the
email package - but the above comment about exposed the header matching
API is a good one. I'll think about how to do this.

Anthony



More information about the Spambayes mailing list