[Tutor] Reading binary files #2
bob gailer
bgailer at gmail.com
Mon Feb 9 18:49:31 CET 2009
etrade.griffiths at dsl.pipex.com wrote:
> Hi
>
> following last week's discussion with Bob Gailer about reading unformatted FORTRAN files, I have attached an example of the file in ASCII format and the equivalent unformatted version.
Thank you. It is good to have real data to work with.
> Below is some code that works OK until it gets to a data item that has no additional associated data, then seems to have got 4 bytes ahead of itself.
Thank you. It is good to have real code to work with.
> I though I had trapped this but it appears not. I think the issue is asociated with "newline" characters or the unformatted equivalent.
>
I think not, But we will see.
I fail to see where the problem is. The data printed below seems to
agree with the files you sent. What am I missing?
FWIW a few observations re coding style and techniques.
1) put the formats in a dictionary before the while loop:
formats = {'INTE': '>i', 'CHAR': '>8s', 'LOGI': '>i', 'REAL': '>f',
'DOUB': '>d', 'MESS': ''>d,}
2) retrieve the format in the while loop from the dictionary:
format = formats[vals[3]]
3) condense the 3 infile lines:
data = open("test.bin","rb").read()
4) nrec is a misleading name (to me it means # of records), nbytes would
be better.
5) Be consistent with the format between calcsize and unpack:
struct.calcsize('>4s8si4s8s')
6) Use meaningful variable names instead of val for the unpacked data:
blank, name, length, typ = struct.unpack ... etc
7) The format for MESS should be '>d' rather than '>%dd' % nval. When
nval is 0 the for loop will make 0 cycles.
8) You don't have a format for DATA (BEGI); therefore the prior format
(for CHAR) is being applied. The formats are the same so it does not
matter but could be confusing later.
> # Test function to write/read from unformatted files
>
> import sys
> import struct
>
> # Read file in one go
>
> in_file = open("test.bin","rb")
> data = in_file.read()
> in_file.close()
>
> # Initialise
>
> nrec = len(data)
> stop = 0
> items = []
>
> # Read data until EOF encountered
>
> while stop < nrec:
>
> # extract data structure
>
> start, stop = stop, stop + struct.calcsize('4s8si4s8s')
> vals = struct.unpack('>4s8si4s8s', data[start:stop])
> items.extend(vals)
> print stop, vals
>
> # define format of subsequent data
>
> nval = int(vals[2])
>
> if vals[3] == 'INTE':
> fmt_string = '>i'
> elif vals[3] == 'CHAR':
> fmt_string = '>8s'
> elif vals[3] == 'LOGI':
> fmt_string = '>i'
> elif vals[3] == 'REAL':
> fmt_string = '>f'
> elif vals[3] == 'DOUB':
> fmt_string = '>d'
> elif vals[3] == 'MESS':
> fmt_string = '>%dd' % nval
> else:
> print "Unknown data type ... exiting"
> print items
> sys.exit(0)
>
> # extract data
>
> for i in range(0,nval):
> start, stop = stop, stop + struct.calcsize(fmt_string)
> vals = struct.unpack(fmt_string, data[start:stop])
> items.extend(vals)
>
> # trailing spaces
>
> if nval > 0:
> start, stop = stop, stop + struct.calcsize('4s')
> vals = struct.unpack('4s', data[start:stop])
>
> # All data read so print items
>
> print items
>
>
> -------------------------------------------------
> Visit Pipex Business: The homepage for UK Small Businesses
>
> Go to http://www.pipex.co.uk/business-services
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
--
Bob Gailer
Chapel Hill NC
919-636-4239
More information about the Tutor
mailing list