"More About Unicode in Python 2 and 3"

Terry Reedy tjreedy at udel.edu
Sun Jan 5 16:10:01 EST 2014


On 1/5/2014 8:14 AM, Mark Lawrence wrote:
> http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/

I disagree with the following claims:

"Looking at that you can see that Python 3 removed something: support 
for non Unicode data text. "

I believe 2.7 str text methods like .upper only supported ascii. General 
non-unicode bytes text support would require an encoding as an attribute 
of the bytes text object. Python never had that.

"Python 3 essentially removed the byte-string type which in 2.x was 
called str."

Python 3 renamed unicode as str and str as bytes. Bytes have essentially 
all the text methods of 2.7 str. Compare dir(str) in 2.7 and dir(bytes) 
in 3.x. The main change of the class itself is that indexing and 
iteration yield ints i, 0 <= i < 256.

"all text operations now are only defined for Unicode strings."

?? Text methods are still defined on (ascii) bytes. It is true that one 
text operation -- string formatting no longer is (and there is an issue 
about that). But one is not all. There is also still discussion about 
within-class transforms, but they are still possible, even if not with 
the codecs module.

I suspect there are other basic errors, but I mostly quit reading at 
this point.

-- 
Terry Jan Reedy




More information about the Python-list mailing list