[Tutor] back on bytes

Terry Carroll carroll at tjc.com
Wed Jul 11 01:17:38 CEST 2007


On Sat, 7 Jul 2007, Alan Gauld wrote:

> I'm intrigued as to why people are using weird combinations
> of math to manipulate bitstrings given that Python has a full
> set of bitwise operators. Surely it is easier and more obvious
> to simply shift the bits right or left using >> and << and use
> bitwise and/or operations than do all this multiplication and
> addition malarky. 

That's a fair point.  I guess it's all in how the problem shows up to you.  
This felt like arbitrary-base arithmetic to me, not bit manipulation.

To recap, what I had was a 32-bit string, representing a 28-bit number. 
The high-order bit of each byte was a zero.  The task is essentially to 
suck out these four bits and then shmush the remaining 28 bits together to 
get the integer value.

When I encountered this, it looked to me like a number encoded in 
base-128: counting from the right, the first byte is in units of 128**0 
(i.e., 1); the second in units of 128**1 (i.e., 128); the third, 128**2 
(16384); and the leftmost, 128**3 (2097152).

So it just made sense to me to write it that way: 

def ID3TagLength(s):
    """
    Given a 4-byte string s, decode as ID3 tag length
    """
    return (ord(s[0]) * 0x200000 +
            ord(s[1]) *   0x4000 +
            ord(s[2]) *     0x80 +
            ord(s[3]) )


I expressed the factors in hex, because I think that the sequence
0x200000, 0x4000, 0x80 is clearer and appears to be less of a "magic
number" sequence than 2097152, 16384, 128.  I figured I'd still understand
it if I had to look at it a couple months later.

Now that you mention it, I certainly could have done it bitwise:

def ID3TagLengthBitwise(s):
    """
    Given a 4-byte string s, decode as ID3 tag length
    """
    return (ord(s[0]) <<   21 |
            ord(s[1]) <<   14 |
            ord(s[2]) <<    7 |
            ord(s[3]) )

It just didn't occur to me, and the first one worked right off.

> (Its also a lot faster!) 

Could be.  But the arithmetic approach immediately worked, and this
sort of field only appears a few times in any particular MP3 file, so it
wasn't in any way a bottleneck.  It never occurred to me to try to
optimize it.  I doubt any speed-up would have been perceptible.  (On the
other hand, I'm the guy who always microwaves his food in multiples of 11
seconds, so I don't have to move my finger all the way down to the "0"
button.  Go figure.)

> If you are going to 
> operate at the bit level why not use the bit level operations?

In my case, this didn't feel like operating at the bit level; it felt like 
doing arbitrary-base arithmetic.

I will add, however, that my approach would fail in Sean's context.  In my 
context, the upper bit in each byte is a zero; in Sean's, it's used to 
signify something unrelated to the numeric value.  So Sean's going to have 
to AND the high-order bit to zero anyway, so bitwise all the way is a 
logical approach.  I think if I'd been working in that context, the 
problem would have showed up to me as a bit manipulation problem, and not 
as an arbitrary-base arithmetic problem.



More information about the Tutor mailing list