[ python-Bugs-1414018 ] email.Utils.py: UnicodeError in RFC2322 header

SourceForge.net noreply at sourceforge.net
Tue Jan 24 21:40:43 CET 2006


Bugs item #1414018, was opened at 2006-01-24 21:19
Message generated for change (Settings changed) made by birkenfeld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1414018&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: A. Sagawa (qbin)
>Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email.Utils.py: UnicodeError in RFC2322 header

Initial Comment:
Description:
collapse_rfc2231_value does not handle UnicodeError
exception. Therefore a header like this one can cause
UnicodeError in attempting unicode conversion.

---
Content-Type: text/plain; charset="ISO-2022-JP"
Content-Disposition: attachment;
 filename*=iso-2022-jp''%1B%24BJs9p%3Dq%2D%21%1B%28B%2Etxt
---

Test script:
---
#! /usr/bin/env python
import sys
import email

msg = email.message_from_file(sys.stdin)
for part in msg.walk():
  print part.get_params()
  print part.get_filename()
---
run
% env LANG=ja_JP.eucJP ./test.py < attached_sample.eml

Background:
Character 0x2d21 is invalid in JIS X0208 but defined in
CP932 (Shift_JIS's superset by Microsoft).  Conversion
between Shift_JIS and ISO-2022-JP are computable
because both of them based on JIS X0208. So sometimes
CP932 characters appear in ISO-2022-JP encoded string,
typically produced by Windows MUA.
But Python's "ISO-2022-JP" means *pure* JIS X0208, thus
conversion is failed.

Workaround:
Convert to fallback_charset and/or skip invalid character.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1414018&group_id=5470


More information about the Python-bugs-list mailing list