[SciPy-User] unpacking binary data from a C structure
Robert Kern
robert.kern at gmail.com
Tue Apr 13 10:41:12 EDT 2010
On Tue, Apr 13, 2010 at 08:20, Tom Kuiper <kuiper at jpl.nasa.gov> wrote:
> Dear list,
>
> here's something I find very strange. I have a C structure defined as:
>
> typedef struct
> {
> unsigned short spcid; /* station id - 10, 40, 60, 21 */
> unsigned short vsrid; /* vsr1a, vsr1b ... from enum */
> unsigned short chanid; /* subchannel id 0,1,2,3 */
> unsigned short bps; /* number of bits per sample - 1, 2, 4,
> 8, or
> 16 */
> unsigned long srate; /* number of samples per second in
> kilo-samples
> per second */
> unsigned short error; /* hw err flag, dma error or num_samples
> error,
> 0 ==> no errors */
> unsigned short year; /* time tag - year */
> unsigned short doy; /* time tag - day of year */
> unsigned long sec; /* time tag - second of day */
> double freq; /* in Hz */
> unsigned long orate; /* number of statistics samples per
> second */
> unsigned short nsubchan; /* number of output sub chans */
> }
> stats_hdr_t;
>
> The python module struct unpack expected format is 'HHHH L HHH L d L H'
> Here's a real header structure as it appears at the head of a file:
>
> 0000000 000d 0001 0006 0008
> 0000010 4240 000f 0000 0000
> 0000020 0000 07da 0064 4730
> 0000030 0001 0000 0000 0000
> 0000040 d800 d31d 421d 03e8
> 0000048 0000 0000 0000 0002
>
> Decoded as unsigned shorts:
>
> 0000000 13 1 6 8
> 0000010 16960 15 0 0
> 0000020 0 2010 100 18224
> 0000030 1 0 0 0
> 0000040 55296 54045 16925 1000
> 0000050 0 0 0 2
>
> Matching these to the stats_hdr_t with 'unpack' notation:
>
> 0000000 H H H H
> 0000010 L1 L2 H ?
> 0000020 ? H H L1
> 0000030 L2 ? ? D1
> 0000040 D2 D3 D4 L1
> 0000050 L2 ? ? H
>
> So the actual format is 'HHHH L H xxxx HH L xxxx d L xxxx H'
> What are all the mystery 4-byte blanks? This works:
>
> buf = fd.read(50)
> header = unpack_from('=4H LH2x 2x2HL4xdL4xH',buf)
>
> Since unpacking binary data must be a fairly common activity in
> scientific circles. I hope you will have some suggestions.
C compilers can insert padding into the memory layout of a structure
in order to align certain members to certain boundaries, particularly
doubles. This behavior is compiler- and platform-specific.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the SciPy-User
mailing list