[issue4661] email.parser: impossible to read messages encoded in a different encoding
R. David Murray
report at bugs.python.org
Fri Oct 1 04:36:24 CEST 2010
R. David Murray <rdmurray at bitdance.com> added the comment:
New version of the patch that adds many more tests, and handles non-ASCII bytes in header values by changing them to '?'s when the header value is retrieved as a string. I think I'm half done. Still to do: generate_bytes, and the doc updates.
By the way, another important reason to use surrogateescape rather than latin1 is that if I miss something and the byte-containing-strings escape, it will be obvious that that is what happened. Otherwise we're back in Python2 bytes/string conflation land.
I of course make no promises about performance. And there is an issue there in that every header value access is now wrapped in an additional function call and a regex test, at a minimum, whether there are bytes present in the input or not :(
----------
Added file: http://bugs.python.org/file19078/email_parse_bytes2.diff
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4661>
_______________________________________
More information about the Python-bugs-list
mailing list