[Tutor] Reading binary files #2

Mon Feb 9 18:49:31 CET 2009

etrade.griffiths at dsl.pipex.com wrote:
> Hi
>
> following last week's discussion with Bob Gailer about reading unformatted FORTRAN files, I have attached an example of the file in ASCII format and the equivalent unformatted version.  

Thank you. It is good to have real data to work with.

> Below is some code that works OK until it gets to a data item that has no additional associated data, then seems to have got 4 bytes ahead of itself.  

Thank you. It is good to have real code to work with.

> I though I had trapped this but it appears not.  I think the issue is asociated with "newline" characters or the unformatted equivalent.
>   

I think not, But we will see.

I fail to see where the problem is. The data printed below seems to 
agree with the files you sent. What am I missing?

FWIW a few observations re coding style and techniques.

1) put the formats in a dictionary before the while loop:
formats = {'INTE': '>i', 'CHAR': '>8s', 'LOGI': '>i', 'REAL': '>f', 
'DOUB': '>d', 'MESS': ''>d,}

2) retrieve the format in the while loop from the dictionary:
format = formats[vals[3]]

3) condense the 3 infile lines:
data = open("test.bin","rb").read()

4) nrec is a misleading name (to me it means # of records), nbytes would 
be better.

5) Be consistent with the format between calcsize and unpack:
struct.calcsize('>4s8si4s8s')

6) Use meaningful variable names instead of val for the unpacked data:
blank, name, length, typ = struct.unpack ... etc

7) The format for MESS should be '>d' rather than '>%dd' % nval. When 
nval is 0 the for loop will make 0 cycles.

8) You don't have a format for DATA (BEGI); therefore the prior format 
(for CHAR) is being applied. The formats are the same so it does not 
matter but could be confusing later.

> # Test function to write/read from unformatted files
>
> import sys
> import struct
>
> # Read file in one go
>
> in_file = open("test.bin","rb")
> data = in_file.read()
> in_file.close()
>
> # Initialise
>
> nrec = len(data)
> stop = 0
> items = []
>
> # Read data until EOF encountered
>
> while stop < nrec:
>     
>     # extract data structure
>
>     start, stop = stop, stop + struct.calcsize('4s8si4s8s')
>     vals = struct.unpack('>4s8si4s8s', data[start:stop])
>     items.extend(vals)
>     print stop, vals
>
>     # define format of subsequent data
>
>     nval = int(vals[2])
>
>     if vals[3] == 'INTE':
>         fmt_string = '>i'
>     elif vals[3] == 'CHAR':
>         fmt_string = '>8s'
>     elif vals[3] == 'LOGI':
>         fmt_string = '>i'
>     elif vals[3] == 'REAL':
>         fmt_string = '>f'
>     elif vals[3] == 'DOUB':
>         fmt_string = '>d'
>     elif vals[3] == 'MESS':
>         fmt_string = '>%dd' % nval
>     else:
>         print "Unknown data type ... exiting"
>         print items
>         sys.exit(0)
>
>     # extract data
>     
>     for i in range(0,nval):
>         start, stop = stop, stop + struct.calcsize(fmt_string)
>         vals = struct.unpack(fmt_string, data[start:stop])
>         items.extend(vals)
>
>     # trailing spaces
>
>     if nval > 0:
>         start, stop = stop, stop + struct.calcsize('4s')
>         vals = struct.unpack('4s', data[start:stop])
>
> # All data read so print items
>
> print items
>
>
> -------------------------------------------------
> Visit Pipex Business: The homepage for UK Small Businesses
>
> Go to http://www.pipex.co.uk/business-services
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

-- 
Bob Gailer
Chapel Hill NC
919-636-4239