[Python-ideas] struct.unpack should support open files
Cameron Simpson
cs at cskk.id.au
Tue Dec 25 21:05:51 EST 2018
On 24Dec2018 10:19, James Edwards <jheiv at jheiv.com> wrote:
>Here's a snippet of semi-production code we use:
>
> def read_and_unpack(handle, fmt):
> size = struct.calcsize(fmt)
> data = handle.read(size)
> if len(data) < size: return None
> return struct.unpack(fmt, data)
>
>which was originally something like:
>
> def read_and_unpack(handle, fmt, offset=None):
> if offset is not None:
> handle.seek(*offset)
> size = struct.calcsize(fmt)
> data = handle.read(size)
> if len(data) < size: return None
> return struct.unpack(fmt, data)
>
>until we pulled file seeking up out of the function.
>
>Having struct.unpack and struct.unpack_from support files would seem
>straightforward and be a nice quality of life change, imo.
These days I go the other way. I make it easy to get bytes from what I'm
working with and _expect_ to parse from a stream of bytes.
I have a pair of modules cs.buffer (for getting bytes from things) and
cs.binary (for parsing structures from binary data). (See PyPI.)
cs.buffer primarily offers a CornuCopyBuffer which manages access to any
iterable of bytes objects. It has a suite of factories to make these
from binary files, bytes, bytes[], a mmap, etc. Once you've got one of
these you have access to a suite of convenient methods. Particularly for
grabbing structs, these's a .take() method which obtains a precise
number of bytes. (Think that looks like a file read? Yes, and it offers
a basic file-like suite of methods too.)
Anyway, cs.binary is based of a PacketField base class oriented around
pulling a binary structure from a CornuCopyBuffer. Obviously, structs
are very common, and cs.binary has a factory:
def structtuple(class_name, struct_format, subvalue_names):
which gets you a PacketField subclass whose parse methods read a struct
and return it to you in a nice namedtuple.
Also, PacketFields self transcribe: you can construct one from its
values and have it write out the binary form.
Once you've got these the tendency is just to make a PacketField
instances from that function for the structs you need and then to just
grab things from a CornuCopyBuffer providing the data. And you no longer
have to waste effort on different code for bytes or files.
Example from cs.iso14496:
PDInfo = structtuple('PDInfo', '>LL', 'rate initial_delay')
Then you can just use PDInfo.from_buffer() or PDInfo.from_bytes() to
parse out your structures from then on.
I used to have tedious duplicated code for bytes and files in various
placed; I'm ripping it out and replacing with this as I encounter it.
Far more reliable, not to mention smaller and easier.
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Python-ideas
mailing list