[issue25545] email parsing docs: clarify that only ASCII strings are supported

Jason R. Coombs report at bugs.python.org
Wed Dec 5 15:36:43 EST 2018


Jason R. Coombs <jaraco at jaraco.com> added the comment:

I don't think this ticket should be implemented as described.

Consider the use-case in importlib_metadata, which loads metadata from a package, metadata known to be of a specified encoding. It already knows the encoding and has decoded the full message to text and now wants to parse it. It seems very much in the remit of something like email.parser to parse already-decoded content.

Yes, the RFCs describe how to decode bytes content, but that shouldn't preclude the e-mail module from supporting parsing from Unicode text.

And in fact, it does seem that the library is able to parse non-ascii Unicode text, especially on Python 3. Consider 'parse-text.py', attached. It illustrates that the parser currently mostly meets my expectation - on Python 2.7 and 3.7, e-mail messages are parsed from unicode text without any indication of an encoding, and returning unicode text on both Python 2 and Python 3.

Python 2 is deficient in that message_from_string will get a UnicodeEncodeError constructing a bytes-oriented StringIO from the input, which is easily worked-around by using the text-oriented io.StringIO.

Still, I would argue the current behavior is desirable and shouldn't be deprecated.

----------
nosy: +barry, jason.coombs
Added file: https://bugs.python.org/file47978/parse-text.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue25545>
_______________________________________


More information about the Python-bugs-list mailing list