[Python-ideas] Ideas for improving the struct module

Elizabeth Myers elizabeth at interlinked.me
Fri Jan 20 18:26:31 EST 2017


On 20/01/17 16:46, Cameron Simpson wrote:
> On 20Jan2017 14:47, Elizabeth Myers <elizabeth at interlinked.me> wrote:
>> 1) struct.unpack and struct.unpack_from should remain
>> backwards-compatible. I don't want to return extra values from it like
>> (length unpacked, (data...)) for that reason.
> 
> Fully agree with this.
> 
>> If the calcsize solution
>> feels a bit weird (it isn't much less efficient, because strings store
>> their length with them, so it's constant-time), there could also be new
>> functions that *do* return the length if you need it. To me though, this
>> feels like a use case for struct.iter_unpack.
> 
> Often, maybe, but there are still going to be protocols that the new
> format doesn't support, where the performant thing to do (in pure
> Python) is to scan what you can with struct and "hand scan" the special
> bits with special code. 
> Consider, for example, a format like MP4/ISO14496, where there's a
> regular block structure (which is somewhat struct parsable) that can
> contain embedded arbitraily weird information. Or the flipside where
> struct parsable data are embedded in a format not supported by struct.
> 
> The mixed situation is where you need to know where the parse got up
> to.  Calling calcsize or its variable size equivalent after a parse
> seems needlessly repetetive of the parse work.
> 
> For myself, I would want there to be some kind of call that returned the
> parse and the length scanned, with the historic interface preserved for
> the fixed size formats or for users not needing the length.
> 
>> 2) I want to avoid making a weird incongruity, where only
>> variable-length strings return the length actually parsed.
> 
> Fully agree. Arguing for two API calls: the current one and one that
> also returns the scan length.
> 
> Cheers,
> Cameron Simpson <cs at zip.com.au>

Some of the responses on the bug are discouraging... mostly seems to
boil down to people just not wanting to expand the struct module or
discourage its use. Everyone is a critic. I didn't know adding two
format specifiers was going to be this controversial. You'd think I
proposed adding braces or something :/.

I'm hesitant to go forward on this until the bug has a resolution.


More information about the Python-ideas mailing list