[Python-ideas] Ideas for improving the struct module

Elizabeth Myers elizabeth at interlinked.me
Fri Jan 20 11:34:30 EST 2017


On 19/01/17 15:04, Yury Selivanov wrote:
> This is a neat idea, but this will only work for parsing framed
> binary protocols.  For example, if you protocol prefixes all packets
> with a length field, you can write an efficient read buffer and
> use your proposal to decode all of message's fields in one shot.
> Which is good.
> 
> Not all protocols use framing though.  For instance, your proposal
> won't help to write Thrift or Postgres protocols parsers.

It won't help them, no, but it will help others who have to do similar
tasks, or help people build things on top of the struct module.

> 
> Overall, I'm not sure that this is worth the hassle.  With proposal:
> 
>    data, = struct.unpack('!H$', buf)
>    buf = buf[2+len(data):]
> 
> with the current struct module:
> 
>    len, = struct.unpack('!H', buf)
>    data = buf[2:2+len]
>    buf = buf[2+len:]

I find such a construction is not really needed most of the time if I'm
dealing with repeated frames. I could just use struct.iter_unpack. It's
not useful in all cases, but as it stands, neither is the present struct
module.

Just because it is not useful to everyone does not mean it is not useful
to others, perhaps immensely so.

The existence of third party libraries that implement a portion of my
rather modest proposal I think already justifies its existence.

> 
> Another thing: struct.calcsize won't work with structs that use
> variable length fields.

Should probably raise an error if the format has a variable-length
string in it. If you're using variable-length strings, you probably
aren't a consumer of struct.calcsize anyway.

> 
> Yury
> 
> 
> On 2017-01-18 5:24 AM, Elizabeth Myers wrote:
>> Hello,
>>
>> I've noticed a lot of binary protocols require variable length
>> bytestrings (with or without a null terminator), but it is not easy to
>> unpack these in Python without first reading the desired length, or
>> reading bytes until a null terminator is reached.
>>
>> I've noticed the netstruct library
>> (https://github.com/stendec/netstruct) has a format specifier, $, which
>> assumes the previous type to pack/unpack is the string's length. This is
>> an interesting idea in of itself, but doesn't handle the null-terminated
>> string chase. I know $ is similar to pascal strings, but sometimes you
>> need more than 255 characters :p.
>>
>> For null-terminated strings, it may be simpler to have a specifier for
>> those. I propose 0, but this point can be bikeshedded over endlessly if
>> desired ;) (I thought about using n/N but they're :P).
>>
>> It's worth noting that (maybe one of?) Perl's equivalent to the struct
>> module, whose name escapes me atm, has a module which can handle this
>> case. I can't remember if it handled variable length or zero-terminated
>> though; maybe it did both. Perl is more or less my 10th language. :p
>>
>> This pain point is an annoyance imo and would greatly simplify a lot of
>> code if implemented, or something like it. I'd be happy to take a look
>> at implementing it if the idea is received sufficiently warmly.
>>
>> -- 
>> Elizabeth
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


More information about the Python-ideas mailing list