Problem with unpack hex to decimal

Sun Apr 17 17:33:18 EDT 2005

On Sun, 17 Apr 2005 20:47:20 +0100, "Jonathan Brady"
<nospam at denbridgedigital.com> wrote:

>
><serpent17 at gmail.com> wrote in message 
>news:1113763068.002612.240940 at l41g2000cwc.googlegroups.com...
>> Hello,
>>
>> I was looking at this:
>> http://docs.python.org/lib/module-struct.html
>> and tried the following
>>
>>>>> import struct
>>>>> struct.calcsize('h')
>> 2
>>>>> struct.calcsize('b')
>> 1
>>>>> struct.calcsize('bh')
>> 4
>>
>> I would have expected
>>
>>>>> struct.calcsize('bh')
>> 3
>>
>> what am I missing ?

A note for the original poster: "unpack hex to decimal" (the subject
line from your posting) is an interesting concept. Hex[adecimal] and
decimal are ways of representing the *same* number.

Let's take an example of a two-byte piece of data. Suppose the first
byte has all bits set (== 1) and the second byte has all bits clear
(== 0). The first byte's value is hexadecimal FF or decimal 255,
whether or not you unpack it, if you are interpreting it as an
unsigned number ('B' format). Signed ('b' format) gives you
hexadecimal -1 and decimal -1. The second byte's value is 0
hexadecimal and 0 decimal however you interpret it.

Suppose you want to interpret the two bytes as together representing a
16-bit signed number (the 'h' format). If the perp is little-endian,
the result is hex FF and decimal 255; otherwise it's hex -100 and
decimal -256.

>
>Not sure, however I also find the following confusing:
>>>> struct.calcsize('hb')
>3
>>>> struct.calcsize('hb') == struct.calcsize('bh')
>False
>
>I could understand aligning to multiples of 4,

Given we know nothing about the OP's platform or your platform, "4" is
no more understandable than any other number.

> but why is 'hb' different 
>from 'bh'? 

Likely explanation: the C compiler aligns n-byte items on an n-byte
boundary. Thus in 'hb', the h is at offset 0, and the b can start OK
at offset 2, for a total size of 3. With 'bh', the b is at offset 0,
but the h can't (under the compiler's rules) start at 1, it must start
at 2, for a total size of 4.

Typically, you would use "native" byte ordering and alignment (the
default) only where you are accessing data in a C struct that is in
code that is compiled on your platform [1]. When you are picking apart
a file that has been written elsewhere, you will typically need to
read the documentation for the file format and/or use trial & error to
determine which prefix (@, <, >) you should use. If I had to guess for
you, I'd go for "<".

[1] Care may need to be taken if the struct is defined in source
compiled by a compiler *other* than the one used to compile your
Python executable -- there's a slight chance you might need to fiddle
with the "foreign" compiler's alignment options to make it suit.

HTH,

John