struct: type registration?
John Machin
sjmachin at lexicon.net
Thu Jun 1 08:37:16 EDT 2006
On 1/06/2006 9:52 PM, Giovanni Bajo wrote:
> John Machin wrote:
>
>>> given the ongoing work on struct (which I thought was a dead
>>> module), I was wondering if it would be possible to add an API to
>>> register custom parsing codes for struct. Whenever I use it for
>>> non-trivial tasks, I always happen to write small wrapper functions
>>> to adjust the values returned by struct.
>>>
>>> An example API would be the following:
>>>
>>> ============================================
>>> def mystring_len():
>>> return 20
>>>
>>> def mystring_pack(s):
>>> if len(s) > 20:
>>> raise ValueError, "a mystring can be at max 20 chars"
>>> s = (s + "\0"*20)[:20]
>> Have you considered s.ljust(20, "\0") ?
>
> Right. This happened to be an example...
>
>>> s = struct.pack("20s", s)
>>> return s
>> I am an idiot, so please be gentle with me: I don't understand why you
>> are using struct.pack at all:
Given a choice between whether I was referring to the particular
instance of using struct.pack two lines above, or whether I was doubting
the general utility of the struct module, you appear to have chosen the
latter, erroneously.
>
> Because I want to be able to parse largest chunks of binary datas with custom
> formatting. Did you miss the whole point of my message:
No.
>
> struct.unpack("3liiSiiShh", data)
>
> You need struct.unpack() to parse these datas, and you need custom
> packer/unpacker to avoid post-processing the output of unpack() just because it
> just knows of basic Python types. In binary structs, there happen to be *types*
> which do not map 1:1 to Python types, nor they are just basic C types (like the
> ones struct supports). Using custom formatter is a way to better represent
> these types (instead of mapping them to the "most similar" type, and then
> post-process it).
>
> In my example, "S" is a basic-type which is a "A 0-terminated 20-byte string",
> and expressing it in the struct format with the single letter "S" is more
> meaningful in my code than using "20s" and then post-processing the resulting
> string each and every time this happens.
>
>
>>>>>> import struct
>>>>>> x = ("abcde" + "\0" * 20)[:20]
>>>>>> x
>> 'abcde\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>>>>> len(x)
>> 20
>>>>>> y = struct.pack("20s", x)
>>>>>> y == x
>> True
>> Looks like a big fat no-op to me; you've done all the heavy lifting
>> yourself.
>
> Looks like you totally misread my message.
Not at all.
Your function:
def mystring_pack(s):
if len(s) > 20:
raise ValueError, "a mystring can be at max 20 chars"
s = (s + "\0"*20)[:20]
s = struct.pack("20s", s)
return s
can be even better replaced by (after reading the manual "For packing,
the string is truncated or padded with null bytes as appropriate to make
it fit.") by:
def mystring_pack(s):
if len(s) > 20:
raise ValueError, "a mystring can be at max 20 chars"
return s
# return s = (s + "\0"*20)[:20] # not needed, according to the manual
# s = struct.pack("20s", s)
# As I said, this particular instance of using struct.pack is a big
fat no-op.
> Your string "x" is what I find in
> binary data, and I need to *unpack* into a regular Python string, which would
> be "abcde".
>
And you unpack it with a custom function that also contains a fat no-op:
def mystring_unpack(s):
assert len(s) == 20
s = struct.unpack("20s", s)[0] # does nothing
idx = s.find("\0")
if idx >= 0:
s = s[:idx]
return s
>
>>> idx = s.find("\0")
>>> if idx >= 0:
>>> s = s[:idx]
>>> return s
>> Have you considered this:
>>
>>>>>> z.rstrip("\0")
>> 'abcde'
>
>
> This would not work because, in the actual binary data I have to parse, only
> the first \0 is meaningful and terminates the string (like in C). There is
> absolutely no guarantees that the rest of the padding is made of \0s as well.
Point taken.
Cheers,
John
More information about the Python-list
mailing list