Reading a Bitstream

Patrick Maupin pmaupin at speakeasy.net
Fri Nov 21 01:16:09 EST 2003


Dietrich Epp wrote:

> Maybe I should clarify: I need to read bit fields.  Neither are they 
> aligned to bytes or do they have fixed offsets.  In fact, in one part 
> of the file there is a list of objects which starts with a 9 bit object 
> type followed by fields whose length and number depend on that object 
> type, ranging from a dummy 1-bit field to a tuple of four fields of 
> length 9, 5, 8, and 8 bits.
> 
> I looked at the array module and can't find what I'm looking for.  
> Here's a bit of typical usage.
> 
> def readStuff(bytes):
>    bits = BitStream(bytes[2:])
>    isSimple = bits.Get(1)
>    objType = chr(bits.Get(8))
>    objType += chr(bits.Get(8))
>    objType += chr(bits.Get(8))
>    objType += chr(bits.Get(8))
>    count = bits.Get(3)
>    bits.Ignore(5)
>    if not isSimple:
>      objId = bits.Get(32)
>    bytes = bytes[2+bits.PartialBytesRead():]
>    return bytes, objType
> 

The fact that you want to read bitfields was perfectly clear.
What was not clear to me was that you were apparently asking
if there was a module which does _exactly_ what you want, and
I thought you might be merely asking if there was a more suitable
module to build your bitstream code on than the struct module.

For this task, IMO struct is unwieldy, and array is much more
suitable,  because the indexing works very nicely, and it is
quick and painless to convert a string into an array.

If you need gobs of performance, you may end up writing a C
module, but if you need moderate performance, you can probably
do better with an array implementation than with a struct
implementation, especially if you go ahead and dump an
entire file into an array.

Here is a module (completely untested, because it's bedtime)
which uses array to provide this sort of functionality.

I call the primary function "read" instead of "Get" because
it's kinda-sorta modelled on a file object.  Also, I didn't
provide an "Ignore", because all you have to do is call
"read" without assigning the results to anything.

Hope this helps.

Pat


"""
bitarray class allows reading from a bitstream.

The number of requested bits on a read are returned
as a positive long integer.

Limitations of this implementation:
 - bitstream is not writable after initialization
 - Must be initialized with a string (not a list or tuple)
 - ASSUMES LITTLE-ENDIAN BITSTREAMS
 - NOT TESTED
 
   Any limitation could be fixed with a six pack :)
"""

import array

class bitarray(object):

    # Support the new Athlon 64's by dynamically
    # calculating number of bytes per long,
    # and optimistically assume we'll have 256-bit
    # processors soon :)

    _bytesperlong = 32/len(array.array('L',32*' '))
    _bitsperlong = _bytesperlong*8

    def __init__(self,source_str):
        self._totalbits = len(source_str) * 8
        self._position = 0

        # Pad to longword boundary, then make an array

        source_str += -len(source_str) % self._bytessperlong * chr(0)
        self._bitstream = array.array('L',source_str)

    def seek(self,offset,whence=0):
        self._position = offset + (0,self._position,self._totalbits)[whence]

    def tell(self):
        return self._position

    def read(self,numbits):
        bitsperlong,position = self._bitsperlong,self._position

        if position < 0 or position + numbits > self._totalbits:
            raise IndexError, "Invalid bitarray._position/numbits"

        longaddress,bitoffset = divmod(position,bitsperlong)

        # We may read bits in the final word after ones we care
        # about, so create a mask to remove them later.

        finalmask = (1L << numbits) - 1

        # We may read bits in the first word before the ones we
        # care about, so bump the total bits to read by this
        # amount, so we read enough higher-order bits.

        numbits += bitoffset

        # Read and concatenate every long which contains a bit we need

        outval,outshift = 0L,0
        while numbits > 0:
            outval += self._bitstream[longaddress] << outshift
            longaddress += 1
            outshift += bitsperlong
            numbits -= bitsperlong

        # numbits is now basically a negative number which tells us
        # how many bits to back up from our current position.

        self._position = longaddress*bitsperlong + numbits

        # Shift right to strip off the low-order bits we
        # don't want, then 'and' with the mask to strip
        # off the high-order bits we don't want.

        return (outval >> bitoffset) & finalmask




More information about the Python-list mailing list