Bytes indexing returns an int

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Jan 7 11:12:59 EST 2014


David Robinow wrote:

> "treating bytes as chars" considered harmful?

Who is talking about treating bytes as chars? You're making assumptions that
aren't justified by my question.


>  I don't know the answer to your question but the behavior seems right to
>  me.

This issue was raised in an earlier discussion about *binary data* in Python
3. (The earlier discussion also involved some ASCII-encoded text, but
that's actually irrelevant to the issue.) In Python 2.7, if you have a
chunk of binary data, you can easily do this:

data = b'\xE1\xE2\xE3\xE4'
data[0] == b'\xE1'

and it returns True just as expected. It even works if that binary data
happens to look like ASCII text:

data = b'\xE1a\xE2\xE3\xE4'
data[1] == b'a'

But in Python 3, the same code silently returns False in both cases, because
indexing a bytes object gives an int. So you have to write something like
these, all of which are ugly or inelegant:

data = b'\xE1a\xE2\xE3\xE4'
data[1] == 0x61
data[1] == ord(b'a')
chr(data[1]) == 'a'
data[1:2] == b'a'


I believe that only the last one, the one with the slice, works in both
Python 2.7 and Python 3.x.


> Python 3 grudgingly allows the "abomination" of byte strings (is that
> what they're called? I haven't fully embraced Python3 yet).

They're not abominations. They exist for processing bytes (hence the name)
and other binary data. They are necessary for low-level protocols, for
dealing with email, web, files, and similar. Application code may not need
to deal with bytes, but that is only because the libraries you call do the
hard work for you.

People trying to port these libraries from 2.7 to 3 run into this problem,
and it causes them grief. This little difference between bytes in 2.7 and
bytes in 3.x is a point of friction which makes porting harder, and I'm
trying to understand the reason for it.


-- 
Steven




More information about the Python-list mailing list