Library for parsing binary structures

dieter dieter at handshake.de
Thu Mar 28 04:12:46 EDT 2019


Paul Moore <p.f.moore at gmail.com> writes:
> I'm looking for a library that lets me parse binary data structures.
> The stdlib struct module is fine for simple structures, but when it
> gets to more complicated cases, you end up doing a lot of the work by
> hand (which isn't that hard, and is generally perfectly viable, but
> I'm feeling lazy ;-))
>
> I know of Construct, which is a nice declarative language, but it's
> either weak, or very badly documented, when it comes to recursive
> structures. (I really like Construct, and if I could only understand
> the docs better I may well not need to look any further, but as it is,
> I can't see anything showing how to do recursive structures...) I am
> specifically trying to parse a structure that looks something like the
> following:
>
> Multiple instances of:
>   - a type byte
>   - a chunk of data structured based on the type
>     types include primitives like byte, integer, etc, as well as
>     (type byte, count, data) - data is "count" occurrences of data of
> the given type.

What you have is a generalized deserialization problem.
It can be solved with a set of deserializers.

    def deserialize(file):
      """read the beginning of file and return the corresponding object."""

In the above case, you have a mapping "type byte --> deserializer",
called "TYPE" and (obviously) "(" is one such "type byte".

The deserializer corresponding to "(" is:
    def sequence_deserialize(file):
      type_byte = file.read(1)
      if not type_byte: raise EOFError()
      type = TYPE[type_byte]
      count = TYPE[INT].deserialize(file)
      seq = [type.deserialize(file) for i in range(count)]
      assert file.read(1) == ")"
      return seq

The top level "deserialize" could look like:
    def top_deserialize(file):
      """generates all values found in *file*."""
      while True:
        type_byte = file.read(1)
        if not type_byte: return
        yield TYPE[type_byte].deserialize(file)





More information about the Python-list mailing list