[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent

Tue Dec 17 15:10:29 EST 2019

R. David Murray <rdmurray at bitdance.com> added the comment:

The problem is that you are starting with different inputs.  unicode strings and bytes are different things, and so parsing them can produce different results.  The fact of that matter is that email messages are defined to be bytes, so parsing a unicode string pretending it is an email message is just asking for errors anyway.  The string parsing methods are really only provided for backward compatibility and historical reasons.

I thought this was clear from the existing documentation, but clearly it isn't :)  I'll review a suggested doc change, but the thing to explain is not that parse and parsebytes might produce different results, but that parsing email from strings is not a good idea and will likely produce unexpected results for anything except the simplest non-mime messages.

Note: the reason you got different checksums might have had to do with line ends, depending on how you calculated the checksums.  You should also consider using get_content and not get_payload.  get_payload has a weird legacy API that doesn't always do what you think it will, and that might be another source of checksum issues.  But really, parsing a unicode representation of a mime message is just likely to be buggy.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue39071>
_______________________________________