write a 20GB file

Jackie Lee jackie.space at gmail.com
Fri May 14 07:32:15 EDT 2010


Thx, Dave,

The code works fine. I just don't know how f.write works. It says that
file.write won't write the file until file.close or file.flush. So I
don't know if the following one is more efficient (sorry I forget to
add condition to break the loop):

#! /usr/bin/env python
#coding=utf-8
import sys
import struct

try:
        f=open(sys.argv[1],'rb+')
except (IOError,Exception):
    print '''usage:
        scriptname segyfilename
'''
    sys.exit(1)

#skip EBCDIC header
try:
    f.seek(3200)
except Exception:
    print 'Oops! your file is broken..'

#read binary header
binhead = f.read(400)
ns = struct.unpack('>h',binhead[20:22])[0]
if ns < 0:
    print 'file read error'
    sys.exit(1)

#read trace header
while True:
    f.seek(28,1)
    if f.read(2) == '':
        break
    f.seek(-2,1)
    f.write(struct.pack('>h',1))
    f.seek(210,1)
    f.seek(ns*4,1)

f.close()


On Fri, May 14, 2010 at 6:04 PM, Dave Angel <davea at ieee.org> wrote:
> Jackie Lee wrote:
>>
>> Hello there,
>>
>> I have a 22 GB binary file, a want to change values of specific
>> positions. Because of the volume of the file, I doubt my code a
>> efficient one:
>>
>> #! /usr/bin/env python
>> #coding=utf-8
>> import sys
>> import struct
>>
>> try:
>>        f=open(sys.argv[1],'rb+')
>> except (IOError,Exception):
>>    print '''usage:
>>        scriptname segyfilename
>> '''
>>    sys.exit(1)
>>
>> #skip EBCDIC header
>> try:
>>    f.seek(3200)
>> except Exception:
>>    print 'Oops! your file is broken..'
>>
>> #read binary header
>> binhead = f.read(400)
>> ns = struct.unpack('>h',binhead[20:22])[0]
>> if ns < 0:
>>    print 'file read error'
>>    sys.exit(1)
>>
>> #read trace header
>> while True:
>>    f.seek(28,1)
>>    f.write(struct.pack('>h',1))
>>    f.seek(212,1)
>>    f.seek(ns*4,1)
>>
>> f.close()
>>
>>
>
> I don't see a question anywhere.  So perhaps you just want comments on your
> code.
>
> 1) How do you plan to test this?
> 2) Consider doing a lot more checking to see that you have in fact a file of
> the right type.
> 3) Fix indentation - perhaps you've accidentally used a tab in the source.
> 4) Provide a termination condition for the while True loop, which currently
> will (I think) go forever, or perhaps until the disk fills up.
> 5) Depending on the purpose of this file, you should consider making the
> changes on a copy, then deleting and renaming.  As it stands, if the program
> gets aborted part way through, there's no way to know how far it got.  Since
> it's just clobbering bytes, it would be safe to rerun the same program
> again, but many times that's not the case.  And this program clearly isn't
> finished yet, so perhaps it's not true here either.
> 6) I don't see anything inefficient about it.  The nature of the problem is
> going to be very slow (for small values of ns), but I don't know what your
> code could do to speed it up.  Perhaps make sure the file is on a fast
> drive, and not RAID 5.
>
> DaveA
>
>



-- 
Jackie



More information about the Python-list mailing list