[Python-ideas] String and bytes bitwise operations

Chris Angelico rosuav at gmail.com
Fri May 18 20:14:07 EDT 2018


On Sat, May 19, 2018 at 8:25 AM, Chris Barker - NOAA Federal
<chris.barker at noaa.gov> wrote:
>> I suppose you could argue that a "byte" is a patch of
>> storage capable of holding a number from 0 to 255, as opposed to being
>> the number itself, but that's getting rather existential :)
>
> No, I’m making the distinction that an eight bit byte is, well,  eight
> bits, that CAN represent a number from 0 to 255, or it can represent
> any other data type — like one eighth of the bits in a float, for
> instance. Or a bit field, or 1/2 a 16 bit int.

Since "bit" simply means "binary digit", that's like saying that a
four-digit number isn't a number; it MIGHT represent a number, but it
might represent one quarter of your credit card. Is "4564" less of a
number for that reason?

>> In Python, a "bytes" object represents a sequence of eight-bit units.
>> When you subscript a bytes [1], you get back an integer with the value
>> at that position.
>
> And when you print it, you get the ascii characters corresponding to
> each byte....

That's because those numbers can often be used to represent
characters. But they are really and truly numbers.

(If you want to get down to brass tacks, a Unicode string could be
treated as a sequence of 21-bit numbers. And in some languages, the
"string" type is actually a highly-optimized version of
21-bit-number-array - or 32-bit, perhaps - with fully supported
use-cases involving numerical (non-textual) data.)

> So one element in a bytes object is no more an integer than a character....

Except that the bytestring b"\x00\x80\xff\x99" very clearly represents
four numbers, but doesn't clearly represent any characters.

>> Maybe I'm completely misunderstanding your statement here.
>
> Again, it doesn’t much matter, until you get to deciding how to
> bitshift an entire bytes object.

Bitshifting a sequence of bytes has nothing whatsoever to do with
characters. It has to do with the individual numbers, and then you
have to decide how you represent those as a collective: little-endian
or big-endian. That's still a matter of numbers, not characters.

ChrisA


More information about the Python-ideas mailing list