[Python-ideas] Ideas for improving the struct module

Fri Jan 20 13:09:19 EST 2017

On 20 January 2017 at 15:13, Nathaniel Smith <njs at pobox.com> wrote:
> On Jan 20, 2017 09:00, "Paul Moore" <p.f.moore at gmail.com> wrote:
>
> On 20 January 2017 at 16:51, Elizabeth Myers <elizabeth at interlinked.me>
> wrote:
>> Should I write up a PEP about this? I am not sure if it's justified or
>> not. It's 3 changes (calcsize and two format specifiers), but it might
>> be useful to codify it.
>
> It feels a bit minor to need a PEP, but having said that did you pick
> up on the comment about needing to return the number of bytes
> consumed?
>
> str = struct.unpack('z', b'test\0xxx')
>
> How do we know where the unpack got to, so that we can continue
> parsing from there? It seems a bit wasteful to have to scan the string
> twice to use calcsize for this...
>
>
> unpack() is OK, because it already has the rule that it raises an error if
> it doesn't exactly consume the buffer. But I agree that if we do this then
> we'd really want versions of unpack_from and pack_into that return the new
> offset. (Further arguments that calcsize is insufficient: it doesn't work
> for potential other variable length items, e.g. if we added uleb128 support;
> it quickly becomes awkward if you have multiple strings; in practice I think
> everyone who needs this would just end up writing a wrapper that calls
> calcsize and returns the new offset anyway, so should just provide that up
> front.)
>
> For pack_into this is also easy, since currently it always returns None, so
> if it started returning an integer no one would notice (and it'd be kinda
> handy in its own right, honestly).
>
> unpack_from is the tricky one, because it already has a return value and
> this isn't it. Ideally it would have worked this way from the beginning, but
> too late for that now... I guess the obvious solution would be to come up
> with a new function that's otherwise identical to unpack_from but returns a
> (values, offset) tuple. What to call this, though, I don't know :-).
> unpack_at? unpack_next? (Hinting that this is the natural primitive you'd
> use to implement unpack_iter.)
>
Yes - maybe a PEP.

Then we could also, for example, add the suggestion of whitespace on
the struct description string
- which is nice.

And we could things of: unpack methods returns a specialized object-
not a tuple, which
has attributes with the extra information.

So, instead of

a, str = struct.unpack("IB$", data)

people who want the length can do:

tmp = struct.unpack("IB$", data)
do_things_with_len(tmp.tell)
a, str = tmp

The struct "object" could allow other things as well. Since we are at it,
maybe a 0 copy version, that would return items from their implace
buffer positions.

But, ok, maybe most of this should just go in a third party package -
anyway, a PEP
could be open for more improvements than the variable-lenght fields proposed.

(The idea of having attributes with extra information about size, for example -
I think that is better than having:

size, (a, str) = struct.unpack2(... )

)

   js
 -><-

> -n
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/