Email parsing and unicode/utf8

dieter dieter at handshake.de
Tue Oct 16 01:31:25 EDT 2018


Thomas Jollans <tjol at tjol.eu> writes:
> I just stumbled over some curious behaviour of the stdlib email parsing
> APIs which accept strings rather than bytes. It appears that you can't
> parse an 8-bit UTF-8 message you have as a str without first encoding it.

The primary purpose of an email parser is likely the parsing
of RFC 822/2045 messages which are a sequence of bytes,
encoded as dictated by RFC 822.
Therefore, I would expect some peculiarities when you feed such
a parser with general text.




More information about the Python-list mailing list