Parsing Binary Structures; Is there a better way / What is your way?

sgriffiths dr.scottgriffiths at gmail.com
Thu Aug 6 04:44:19 EDT 2009


On Aug 5, 3:46 pm, "Martin P. Hellwig" <martin.hell... at dcuktec.org>
wrote:
> Hi List,
>
> On several occasions I have needed (and build) a parser that reads a
> binary piece of data with custom structure. For example (bogus one):
>
> BE
> +---------+---------+-------------+-------------+------+--------+
> | Version | Command | Instruction | Data Length | Data | Filler |
> +---------+---------+-------------+-------------+------+--------+
> Version: 6 bits
> Command: 4 bits
> Instruction: 5 bits
> Data Length: 5 bits
> Data: 0-31 bits
> Filler: filling 0 bits to make the packet dividable by 8
>
> what I usually do is read the packet in binary mode, convert the output
> to a concatenated 'binary string'(i.e. '0101011000110') and then use
> slice indeces to get the right data portions.
> Depending on what I need to do with these portions I convert them to
> whatever is handy (usually an integer).
>
> This works out fine for me. Most of the time I also put the ASCII art
> diagram of this 'protocol' as a comment in the code, making it more
> readable/understandable.
>
> Though there are a couple of things that bothers me with my approach:
> - This seems such a general problem that I think that there must be
> already a general pythonic solution.
> - Using a string for binary representation takes at least 8 times more
> memory for the packet than strictly necessary.
> - Seems to need a lot of prep work before doing the actual parsing.
>
> Any suggestion is greatly appreciated.
>
> --
> MPHhttp://blog.dcuktec.com
> 'If consumed, best digested with added seasoning to own preference.'

Take a look at the bitstring module (in pypi or google code). It's
designed to help make this sort of thing easy and it's more fully
featured than BitVector or BitSet. Internally the data is stored as a
byte array, so memory isn't wasted. It will also do all the dirty work
of bit masking and shifting so that you can concentrate on the real
problems. For example:

>>> s = BitString('0x1232432312')  # just to give us some data to play with
>>> ver, comm, instr, bitlen = s.read('uint6, bin4, bin5, uint5')
>>> data = s.readbits(bitlen)

Different interpretations of the binary data are given using Python
properties (e.g. s.hex, s.oct, s.uint, etc.) and it supports bit-wise
slicing, modification, finding, replacing and more. It is also still
in active development (full disclosure: I'm the author :-)).




More information about the Python-list mailing list