[Python-ideas] Python Convert

Steven D'Aprano steve at pearwood.info
Fri Jul 12 07:16:57 CEST 2013


On 12/07/13 12:01, Daniel Rode wrote:
> Since Python3, the python creators removed a lot of encodings from the
> str.encode() method. They did it because they weren't sure how to implement
> the feature in Python3. They wanted it to be better.

That's wrong. They didn't remove them, they are just inaccessible from the string API. And they didn't do it because they weren't sure how to implement the feature, but because the feature was broken. Strings had both an encode and decode method, and people kept using the wrong one and getting weird results.

Python 3 has the right API: you *encode* strings to bytes, and only bytes, and you *decode* bytes to strings, and only strings.

However, the codec machinery is a lot more general than just str <-> bytes. Codecs can transform from bytes to bytes, or from strings to strings, or to other things, and you can still do so using the codecs module:

py> codecs.encode(b"Hello World", "hex_codec")
b'48656c6c6f20576f726c64'
py> codecs.encode("Hello World", "rot_13")
'Uryyb Jbeyq'


although the interface is a bit clunky. There's no way of telling ahead of time whether a codec expects bytes or strings.


See also this open bug report:

http://bugs.python.org/issue7475

and this one, pointing out that there's no easy way to know what codecs are available:

http://bugs.python.org/issue17878


So there's a fair bit of improvement needed in the codec machinery.




-- 
Steven


More information about the Python-ideas mailing list