[Python-ideas] struct.unpack should support open files

Cameron Simpson cs at cskk.id.au
Tue Dec 25 21:05:51 EST 2018


On 24Dec2018 10:19, James Edwards <jheiv at jheiv.com> wrote:
>Here's a snippet of semi-production code we use:
>
>    def read_and_unpack(handle, fmt):
>        size = struct.calcsize(fmt)
>        data = handle.read(size)
>        if len(data) < size: return None
>        return struct.unpack(fmt, data)
>
>which was originally something like:
>
>    def read_and_unpack(handle, fmt, offset=None):
>        if offset is not None:
>            handle.seek(*offset)
>        size = struct.calcsize(fmt)
>        data = handle.read(size)
>        if len(data) < size: return None
>        return struct.unpack(fmt, data)
>
>until we pulled file seeking up out of the function.
>
>Having struct.unpack and struct.unpack_from support files would seem
>straightforward and be a nice quality of life change, imo.

These days I go the other way. I make it easy to get bytes from what I'm 
working with and _expect_ to parse from a stream of bytes.

I have a pair of modules cs.buffer (for getting bytes from things) and 
cs.binary (for parsing structures from binary data). (See PyPI.)

cs.buffer primarily offers a CornuCopyBuffer which manages access to any 
iterable of bytes objects. It has a suite of factories to make these 
from binary files, bytes, bytes[], a mmap, etc. Once you've got one of 
these you have access to a suite of convenient methods. Particularly for 
grabbing structs, these's a .take() method which obtains a precise 
number of bytes. (Think that looks like a file read? Yes, and it offers 
a basic file-like suite of methods too.)

Anyway, cs.binary is based of a PacketField base class oriented around 
pulling a binary structure from a CornuCopyBuffer. Obviously, structs 
are very common, and cs.binary has a factory:

    def structtuple(class_name, struct_format, subvalue_names):

which gets you a PacketField subclass whose parse methods read a struct 
and return it to you in a nice namedtuple.

Also, PacketFields self transcribe: you can construct one from its 
values and have it write out the binary form.

Once you've got these the tendency is just to make a PacketField 
instances from that function for the structs you need and then to just 
grab things from a CornuCopyBuffer providing the data. And you no longer 
have to waste effort on different code for bytes or files.

Example from cs.iso14496:

     PDInfo = structtuple('PDInfo', '>LL', 'rate initial_delay')

Then you can just use PDInfo.from_buffer() or PDInfo.from_bytes() to 
parse out your structures from then on.

I used to have tedious duplicated code for bytes and files in various 
placed; I'm ripping it out and replacing with this as I encounter it.  
Far more reliable, not to mention smaller and easier.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-ideas mailing list