struct: type registration?

Thu Jun 1 08:37:16 EDT 2006

On 1/06/2006 9:52 PM, Giovanni Bajo wrote:
> John Machin wrote:
> 
>>> given the ongoing work on struct (which I thought was a dead
>>> module), I was wondering if it would be possible to add an API to
>>> register custom parsing codes for struct. Whenever I use it for
>>> non-trivial tasks, I always happen to write small wrapper functions
>>> to adjust the values returned by struct.
>>>
>>> An example API would be the following:
>>>
>>> ============================================
>>> def mystring_len():
>>>     return 20
>>>
>>> def mystring_pack(s):
>>>     if len(s) > 20:
>>>         raise ValueError, "a mystring can be at max 20 chars"
>>>     s = (s + "\0"*20)[:20]
>> Have you considered s.ljust(20, "\0") ?
> 
> Right. This happened to be an example...
> 
>>>     s = struct.pack("20s", s)
>>>     return s
>> I am an idiot, so please be gentle with me: I don't understand why you
>> are using struct.pack at all:

Given a choice between whether I was referring to the particular 
instance of using struct.pack two lines above, or whether I was doubting 
the general utility of the struct module, you appear to have chosen the 
latter, erroneously.

> 
> Because I want to be able to parse largest chunks of binary datas with custom
> formatting. Did you miss the whole point of my message:

No.

> 
> struct.unpack("3liiSiiShh", data)
> 
> You need struct.unpack() to parse these datas, and you need custom
> packer/unpacker to avoid post-processing the output of unpack() just because it
> just knows of basic Python types. In binary structs, there happen to be *types*
> which do not map 1:1 to Python types, nor they are just basic C types (like the
> ones struct supports). Using custom formatter is a way to better represent
> these types (instead of mapping them to the "most similar" type, and then
> post-process it).
> 
> In my example, "S" is a basic-type which is a "A 0-terminated 20-byte string",
> and expressing it in the struct format with the single letter "S" is more
> meaningful in my code than using "20s" and then post-processing the resulting
> string each and every time this happens.
> 
> 
>>>>>> import struct
>>>>>> x = ("abcde" + "\0" * 20)[:20]
>>>>>> x
>> 'abcde\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>>>>> len(x)
>> 20
>>>>>> y = struct.pack("20s", x)
>>>>>> y == x
>> True
>> Looks like a big fat no-op to me; you've done all the heavy lifting
>> yourself.
> 
> Looks like you totally misread my message.

Not at all.

Your function:

def mystring_pack(s):
     if len(s) > 20:
         raise ValueError, "a mystring can be at max 20 chars"
     s = (s + "\0"*20)[:20]
     s = struct.pack("20s", s)
     return s

can be even better replaced by (after reading the manual "For packing, 
the string is truncated or padded with null bytes as appropriate to make 
it fit.") by:

def mystring_pack(s):
     if len(s) > 20:
         raise ValueError, "a mystring can be at max 20 chars"
     return s
     # return s = (s + "\0"*20)[:20] # not needed, according to the manual
     # s = struct.pack("20s", s)
     # As I said, this particular instance of using struct.pack is a big 
fat no-op.

> Your string "x" is what I find in
> binary data, and I need to *unpack* into a regular Python string, which would
> be "abcde".
>

And you unpack it with a custom function that also contains a fat no-op:

def mystring_unpack(s):
     assert len(s) == 20
     s = struct.unpack("20s", s)[0] # does nothing
     idx = s.find("\0")
     if idx >= 0:
         s = s[:idx]
     return s

> 
>>>     idx = s.find("\0")
>>>     if idx >= 0:
>>>         s = s[:idx]
>>>     return s
>> Have you considered this:
>>
>>>>>> z.rstrip("\0")
>> 'abcde'
> 
> 
> This would not work because, in the actual binary data I have to parse, only
> the first \0 is meaningful and terminates the string (like in C). There is
> absolutely no guarantees that the rest of the padding is made of \0s as well.

Point taken.

Cheers,
John