email bug?

Andrew Dalke adalke at mindspring.com
Sun Aug 24 00:14:45 EDT 2003


Stuart D. Gathman:
> Content-Type: image/pjpeg; name="Jim&&Jill"

> What IE apparently gets is:
>
> [('image/pjpeg', ''), ('name', '"Jim&&Jill"')]
>
> Is this a bug (in the email package, I mean - obviously IE is buggy)?
>
> Do I have to write my own custom param parsing routines to handle this?

BTW, I verified this in 2.3.

Looks like the Content-Type syntax is defined in
http://www.faqs.org/rfcs/rfc2045.html
5.1.  Syntax of the Content-Type Header Field

     content := "Content-Type" ":" type "/" subtype
                *(";" parameter)

     parameter := attribute "=" value

     value := token / quoted-string

     token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                 or tspecials>

     tspecials :=  "(" / ")" / "<" / ">" / "@" /
                   "," / ";" / ":" / "\" / <">
                   "/" / "[" / "]" / "?" / "="
                   ; Must be in quoted-string,
                   ; to use within parameter values

So the ";" must be in a quoted string.  That's defined in
RFC 822,  http://www.faqs.org/rfcs/rfc822.html
(now obsolete)

     quoted-string = <"> *(qtext/quoted-pair) <">

     qtext       =  <any CHAR excepting <">,     ; => may be folded
                     "\" & CR, and including
                     linear-white-space>

     CHAR        =  <any ASCII character>

The ';' is in CHAR and is not "\" nor CR so it's in qtext,
so it's part of quoted-string, so it's allowed in a value
without extra interpretation.

I looks like 2822 (the updated version of 822) a
http://www.faqs.org/rfcs/rfc2822.html agrees.

So I think it's a bug in the email module's parser.

The actual bug is in email/Parser.py with

# Regular expression used to split header parameters.  BAW: this may be too
# simple.  It isn't strictly RFC 2045 (section 5.1) compliant, but it
catches
# most headers found in the wild.  We may eventually need a full fledged
# parser eventually.
paramre = re.compile(r'\s*;\s*')

A quick scan of the code suggests that it isn't a quick fix (eg,
not just a matter of tweaking that regexp.

Could you file a bug report against it?

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list