[ python-Bugs-640110 ] email.Header misparses mixed headers

Wed Apr 27 14:23:00 CEST 2005

Bugs item #640110, was opened at 2002-11-18 15:33
Message generated for change (Comment added) made by kalinda
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=640110&group_id=5470

Category: Python Library
Group: Python 2.2.2
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Anders Hammarquist (iko)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email.Header misparses mixed headers

Initial Comment:
email.Header.decode_header() misparses headers with
both encoded an unencoded words. This example from RFC2047

=?ISO-8859-1?Q?Andr=E9?= Pirard &lt;PIRARD at vm1.ulg.ac.be&gt;

gets parsed as

AndréPirard &lt;PIRARD at vm1.ulg.ac.be&gt;

where there should obviously be a space between André
and Pirard. RFC2047 says to ignore spaces between
encoded words (but not between encoded and unencoded
words, though it doesn't explicitly say so from what I
could find, and obviously not between unencoded words).

Also, I see it's trying to handle continuation lines,
but it only does it if there are encoded words in the
continuation line. It barfs badly on this test case:

'Re: =?mac-iceland?q?r=8Aksm=9Arg=8Cs?= baz\n foo bar
=?mac-iceland?q?r=8Aksm=9Arg=8Cs?='

I think I'll just do a patch...

/Anders

P.S. It seems at least remotely related to Bug#552957

----------------------------------------------------------------------

Comment By: jonny reichwald (kalinda)
Date: 2005-04-27 14:23

Message:
Logged In: YES 
user_id=661399

I am using python 2.4 and still have this problem. To be
more exact, line 73 in Header.py still strips the parts.
Is there a reason for this not being fixed?

----------------------------------------------------------------------

Comment By: Anders Hammarquist (iko)
Date: 2003-03-06 17:43

Message:
Logged In: YES 
user_id=14

Looks OK.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2003-03-06 17:21

Message:
Logged In: YES 
user_id=12800

Try current cvs.

----------------------------------------------------------------------

Comment By: Anders Hammarquist (iko)
Date: 2003-03-06 15:15

Message:
Logged In: YES 
user_id=14

The first bug is still there... With version 1.19 from CVS I
get this with my example:

&gt;&gt;&gt; print
unicode(Header.make_header(Header.decode_header('=?ISO-8859-1?Q?Andr=E9?=
Pirard &lt;PIRARD at vm1.ulg.ac.be&gt;'))).encode('latin-1')
AndréPirard &lt;PIRARD at vm1.ulg.ac.be&gt;

(The problem is that whitespaces get stripped of on line 91:
unenc = parts.pop(0).strip()
before we know whether they are significant or not.

The continuation line bug seems to be fixed however.

/Anders

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2003-03-06 07:50

Message:
Logged In: YES 
user_id=12800

The first bug above has already been fixed in email 2.5
(python 2.3 cvs).  The second pointed to a real bug, now
fixed I believe.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=640110&group_id=5470