[Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer

Joel Bender jjb5 at cornell.edu
Thu Sep 27 19:14:53 CEST 2007


> Making an iterator over an integer sequence acceptable in the 
> constructor strongly suggests that a byte sequence contains integers 
> between 0 and 255 inclusive, not length 1 byte sequences.
> 
> And I think that's the cleanest conceptual model for them as well. A 
> byte sequence doesn't contain length 1 byte sequences, it contains bytes 
> (i.e. numbers between 0 and 255 inclusive).

Using standards language, an octet string contains octets.  Since Python 
blurs the distinction between characters and strings of length 1, 
shouldn't it also blur the distinction between octets and an octet 
strings of length 1?

> The only problematic case is cases such as iterating over a byte 
> sequence where we may have an integer and want to compare it to a length 
> 1 byte string.

Why is it problematic?  Why does a programmer have to jump through hoops 
to compare the two?

      >>> x, y = "abc", "a"
      >>> x[0] == y
      True

And the same should be true for octet strings:

      >>> x, y = b"abc", b"a"
      >>> x[0] == y
      True

> With just the simple conceptual model...

Python doesn't have a simple conceptual model, there is no distinction 
between strings of length 1 and characters.  This makes it pretty clear 
that octet strings contain octets:

     >>> list(b"1234")
     [49, 50, 51, 52, 53]

And you should be able check for an octet in an octet string:

     >>> 51 in b"1234"
     True

And if I want to specify the same octet in ASCII do this:

     >>> b'3' in b"1234"
     True

> I don't think it's worth breaking the conceptual model of the data type 
> just to reduce the simplest spelling of that comparison by 3 characters.

The programmer shouldn't have to go through any one of those gyrations, 
the only reason why saying chr(51) == '3' is necessary is because 
characters and integers are different types.  But octets and "integers 
in the range(256)" are exactly the same thing.

     >>> b'3' == 51
     True

The fact that octets can be written as an octet string of length 1 is 
just a happy coincidence of Python, just like characters.

>    for val in data.fragments():
>        if val == b'x':
>            print "Found an x!"

That's a hideous amount of work to just say:

     if b'x' in data:
         print "Found an x!"


Joel



More information about the Python-3000 mailing list