From jason at mastaler.com Tue Oct 10 22:12:10 2006 From: jason at mastaler.com (Jason R. Mastaler) Date: Tue, 10 Oct 2006 14:12:10 -0600 Subject: [Email-SIG] why is this HeaderParseError being raised? Message-ID: I'm probably missing something obvious, but can someone tell me why email.header.decode_header() is blowing up when trying to decode the Subject header of this message? It looks legitimate to me. It's not a spam message, and was prodcued by Microsoft Outlook Express: http://mastaler.com/tmp/1150650768.21756.msg.txt I'm reproducing it with this code: Python 2.5 (r25:51908, Sep 21 2006, 13:04:20) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from email.parser import Parser >>> from email.header import decode_header >>> msg = Parser().parse(open('1150650768.21756.msg.txt')) >>> subject = decode_header(msg['subject']) Traceback (most recent call last): File "", line 1, in File "/sw/lib/python2.5/email/header.py", line 100, in decode_header raise HeaderParseError email.errors.HeaderParseError >>> From tkikuchi at is.kochi-u.ac.jp Tue Oct 10 23:04:07 2006 From: tkikuchi at is.kochi-u.ac.jp (Tokio Kikuchi) Date: Wed, 11 Oct 2006 06:04:07 +0900 Subject: [Email-SIG] why is this HeaderParseError being raised? In-Reply-To: References: Message-ID: <452C0AC7.6010501@is.kochi-u.ac.jp> Jason R. Mastaler wrote: > I'm probably missing something obvious, but can someone tell me why > email.header.decode_header() is blowing up when trying to decode the > Subject header of this message? It looks legitimate to me. It's not > a spam message, and was prodcued by Microsoft Outlook Express: > > http://mastaler.com/tmp/1150650768.21756.msg.txt You need two trailing paddings (=) in the encoded string. Subject: =?iso-8859-1?B?UmU6IEFXOiBCZXN0ZWxsdW5nIEJhZ3RhZ3MgZvxyIEdvbGYgQ2x1YiBL/HNzbmFjaA?= should be Subject: =?iso-8859-1?B?UmU6IEFXOiBCZXN0ZWxsdW5nIEJhZ3RhZ3MgZvxyIEdvbGYgQ2x1YiBL/HNzbmFjaA==?= In RFC3548, Implementations MUST include appropriate pad characters at the end of encoded data ... -- Tokio Kikuchi, tkikuchi at is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/ From jason at mastaler.com Wed Oct 11 17:38:37 2006 From: jason at mastaler.com (Jason R. Mastaler) Date: Wed, 11 Oct 2006 09:38:37 -0600 Subject: [Email-SIG] why is this HeaderParseError being raised? References: <452C0AC7.6010501@is.kochi-u.ac.jp> Message-ID: <87wt76rhfm.fsf@deacon-blues.mid.mastaler.com> Tokio Kikuchi writes: >> I'm probably missing something obvious, but can someone tell me why >> email.header.decode_header() is blowing up when trying to decode the >> Subject header of this message? It looks legitimate to me. It's not >> a spam message, and was prodcued by Microsoft Outlook Express: >> >> http://mastaler.com/tmp/1150650768.21756.msg.txt > > You need two trailing paddings (=) in the encoded string. > > Subject: > =?iso-8859-1?B?UmU6IEFXOiBCZXN0ZWxsdW5nIEJhZ3RhZ3MgZvxyIEdvbGYgQ2x1YiBL/HNzbmFjaA?= > should be > Subject: > =?iso-8859-1?B?UmU6IEFXOiBCZXN0ZWxsdW5nIEJhZ3RhZ3MgZvxyIEdvbGYgQ2x1YiBL/HNzbmFjaA==?= > > In RFC3548, > Implementations MUST include appropriate pad characters at the end of > encoded data ... Thanks. Any idea what might be causing this? I haven't heard of Outlook Express being RFC 3548 non-compliant before.