Dynamic repeat counts in struct format strings?

Mon Apr 28 23:20:22 EDT 2003

I suppose this message is just wondering aloud, but:

Why hasn't the struct module grown a feature to specify the repeat count of
an element based on the value of the previous element, like perl's
(http://www.perldoc.com/perl5.6/pod/func/pack.html)?

What I mean is, why can't I do something like this (borrowing perl syntax of
'/'):

>>> struct.unpack('!h/s', '\x00\x05Hello')
('Hello',)

Or, more generally:

>>> data = '\x02\x00\x04\x00\x05'
>>> struct.unpack('!b/h', data)
(4, 5)

The '/' specifies that the previous element functions as a repeat count for
the next element, much like how we can specify a numeric repeat count
currently.  This last snippet would be equivalent to what we have to do now:

>>> s = struct.calcsize('!b')
>>> i = struct.unpack('!b', data[:s])[0]  # i == 2
>>> struct.unpack('!'+`i`+'h', data[s:])  # fmt == '!2h'
(4, 5)

Even better would be:

>>> data = '\x02\x00\x03abc\x00\x04Test'
>>> struct.unpack( '!b(h(s))', data)
('abc', 'Test')

In other words, x(fmt), where x is repeat count of format string fmt, and x
is not included in the output.

And for pack():

>>> struct.pack('!b(h(s))', 'abc', 'Test')
'\x02\x00\x03abc\x00\x04Test'

(Implementing a pack() that acted like this would be quite a bit harder,
though, I think, since the semantics have changed. pack() as it is now is
very dumb--every element maps directly to something you pass it.)

And while I'm on it, I might as well mention that it would be nice if there
were a variant of unpack() that returned the unused data as the last element
in the tuple, instead of throwing an exception. (It's easy enough to write
one's self, though.)

Now, I'm not advocating that struct become perl's /pack and /unpack (struct
is _far_ cleaner), but being able to specify a repeat count within the data
itself would be a very useful feature, making it easier to wrap some generic
'data structure' type interface around struct (like the xstruct module),
faster (less python string/list fiddling and repeated calls to pack or
unpack), and generally less fiddly.

Is there any specific reason why this type of functionality never found its
way into struct?  Is it the same reason that pascal strings include the
length byte in the length (or is that just how pascal strings are
implemented)?  Is there some fundamental reason why calcsize() always has to
be deterministic (i.e., always knows exactly, given any format string, what
the length of the data will be?)  Even pascal strings require that you
explicitly name the length of the string when packing (it silently truncates
or null-pads the string you pass to it).  With this feature you wouldn't
even need 'p'!  It would be expressed as 'B/s' or 'B(s)'.

(Of course, everyone's just going to say, hey, great idea, now go implement
it!)

--
Francis Avila