[issue19662] smtpd.py should not decode utf-8

Tue Mar 18 20:48:57 CET 2014

R. David Murray added the comment:

I propose that we add a new keyword argument to SMTP's __init__, 'decode_data'.  This would be set to True by default, and would preserve the current behavior of passing utf-8 decoded data to process_message.

Setting it to True would mean that process_message would get passed binary (undecoded) data.

In 3.5 we add this keyword, but we immediately deprecate 'decode_data=True'.  In 3.6 we change the default to decode_data=False, and we deprecate the decode_data keyword.  Then in 3.7 we drop the decode_data keyword.

Now, as for implementation: what 'push' currently does (encode to ascii) is just fine for now.  What we need to change is collect_incoming_data (where the decode happens) and found_terminator (where the data is passed to other parts of the class or its subclasses).

When decode_data is False, collect_incoming_data should not decode.  received_lines should be binary.  Then, in found_terminator the else branch of the if can pass the binary received_lines into process_message (care will be needed to use the correct data types for the various operations).  In the first branch of the if, though, when decode_data is False the data will now need to be decoded (still, I think, using utf-8) so that text can still be used to manipulate this part of the API, since unlike the message data it *is* conceptually text, just encoded as ASCII.  (I suggest still decoding using utf-8 rather than ASCII because this will be useful when we implement RFC6531.)  This will provide for the smallest number of needed changes to subclasses when converting to decode_data=False mode.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19662>
_______________________________________