email header decoding fails

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Thu Apr 10 04:31:27 EDT 2008


En Wed, 09 Apr 2008 23:12:00 -0300, ZeeGeek <ZeeGeek at gmail.com> escribió:

> It seems that the decode_header function in email.Header fails when
> the string is in the following form,
>
> '=?gb2312?Q?=D0=C7=C8=FC?=(revised)'
>
> That's when a non-encoded string follows the encoded string without
> any whitespace. In this case, decode_header function treats the whole
> string as non-encoded. Is there a work around for this problem?

That header does not comply with RFC2047 (MIME Part Three: Message Header  
Extensions for Non-ASCII Text)

Section 5 (1)
     An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
     in any Subject or Comments header field, any extension message
     header field, or any MIME body part field for which the field body
     is defined as '*text'. [...]
     Ordinary ASCII text and 'encoded-word's may appear together in the
     same header field.  However, an 'encoded-word' that appears in a
     header field defined as '*text' MUST be separated from any adjacent
     'encoded-word' or 'text' by 'linear-white-space'.

Section 5 (3)
     As a replacement for a 'word' entity within a 'phrase', for example,
     one that precedes an address in a From, To, or Cc header.  [...]
     An 'encoded-word' that appears within a
     'phrase' MUST be separated from any adjacent 'word', 'text' or
     'special' by 'linear-white-space'.

-- 
Gabriel Genellina




More information about the Python-list mailing list