[Python-Dev] bytes type discussion

"Martin v. Löwis" martin at v.loewis.de
Wed Feb 15 02:11:24 CET 2006

Raymond Hettinger wrote:
>>- bytes("abc") == bytes(map(ord, "abc"))
> At first glance, this seems obvious and necessary, so if it's somewhat 
> controversial, then I'm missing something.  What's the issue?

There is an "implicit Latin-1" assumption in that code. Suppose
you do

# -*- coding: koi-8r -*-
print bytes("Гвидо ван Россум")

in Python 2.x, then this means something (*). In Python 3, it gives
you an exception, as the ordinals of this are suddenly above 256.

Or, perhaps worse, the code

# -*- coding: utf-8 -*-
print bytes("Martin v. Löwis")

will work in 2.x and 3.x, but produce different numbers (**).


(*) [231, 215, 201, 196, 207, 32, 215, 193, 206, 32, 242, 207, 211, 211,
213, 205]

(**) In 2.x, this will give
[77, 97, 114, 116, 105, 110, 32, 118, 46, 32, 76, 195, 182, 119, 105, 115]
whereas in 3.x, it will give
[77, 97, 114, 116, 105, 110, 32, 118, 46, 32, 76, 246, 119, 105, 115]

More information about the Python-Dev mailing list