[issue4661] email.parser: impossible to read messages encoded in a different encoding

Fri Oct 1 04:36:24 CEST 2010

R. David Murray <rdmurray at bitdance.com> added the comment:

New version of the patch that adds many more tests, and handles non-ASCII bytes in header values by changing them to '?'s when the header value is retrieved as a string.  I think I'm half done.  Still to do: generate_bytes, and the doc updates.

By the way, another important reason to use surrogateescape rather than latin1 is that if I miss something and the byte-containing-strings escape, it will be obvious that that is what happened.  Otherwise we're back in Python2 bytes/string conflation land.

I of course make no promises about performance.  And there is an issue there in that every header value access is now wrapped in an additional function call and a regex test, at a minimum, whether there are bytes present in the input or not :(

----------
Added file: http://bugs.python.org/file19078/email_parse_bytes2.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4661>
_______________________________________