Parsing Binary Structures; Is there a better way / What is your way?

Masklinn masklinn at masklinn.net
Wed Aug 5 13:22:42 EDT 2009


On 5 Aug 2009, at 16:46 , Martin P. Hellwig wrote:
> Hi List,
>
> On several occasions I have needed (and build) a parser that reads a  
> binary piece of data with custom structure. For example (bogus one):
>
> BE
> +---------+---------+-------------+-------------+------+--------+
> | Version | Command | Instruction | Data Length | Data | Filler |
> +---------+---------+-------------+-------------+------+--------+
> Version: 6 bits
> Command: 4 bits
> Instruction: 5 bits
> Data Length: 5 bits
> Data: 0-31 bits
> Filler: filling 0 bits to make the packet dividable by 8
>
> what I usually do is read the packet in binary mode, convert the  
> output to a concatenated 'binary string'(i.e. '0101011000110') and  
> then use slice indeces to get the right data portions.
> Depending on what I need to do with these portions I convert them to  
> whatever is handy (usually an integer).
>
> This works out fine for me. Most of the time I also put the ASCII  
> art diagram of this 'protocol' as a comment in the code, making it  
> more readable/understandable.
>
> Though there are a couple of things that bothers me with my approach:
> - This seems such a general problem that I think that there must be  
> already a general pythonic solution.
> - Using a string for binary representation takes at least 8 times  
> more memory for the packet than strictly necessary.
> - Seems to need a lot of prep work before doing the actual parsing.
>
> Any suggestion is greatly appreciated.
The gold standard for binary parsing (and serialization) is probably  
Erlang's bit syntax, but as far as Python goes you might be interested  
by Hachoir (http://hachoir.org/ but it seems down right now).

It's not going to match your second point, but it can probably help  
with the rest (caveat: I haven't used hachoir personally).



More information about the Python-list mailing list