File processing

Alex Martelli aleaxit at yahoo.com
Mon Jul 9 17:44:59 EDT 2001


"Chris McMillan" <christopherjmcmillan at eaton.com> wrote in message
news:9id5p9$8r62 at interserv.etn.com...
> Hello all!
>
> I'm trying to write a script that will open a file, delete the first line,
> and then save the file with the same name.  Can someone please point me in
> the right direction?  Thanks!

There are basically two good architectures for this task:

1. simplest, fastest, most suitable for files that fit in memory (say, up to
    about 100 or 200 MB on a typical middling machine as sold today):
    read all lines, write all but the first

2. almost-as-simple, maybe a tad slower, OK for any size file:
    loop reading lines, each time (except the first) writing the
    line back to a file that will later be renamed to cover the
    old one (or, physically to the old file, which however risks
    damaging it if the power goes midway through...).

def allbutone_1(filename):
    all_lines = open(filename).readlines()
    open(filename,'w').writelines(all_lines[1:])

Hard to beat for simplicity, isn't it?

def allbutone_2(filename):
    import fileinput
    for line in fileinput.input(filename,inplace=1,backup='.sav'):
        if fileinput.filelineno() > 1: print line,

As you can see, module fileinput makes the task almost as
easy as the simplest approach!  You also automatically get
a copy of the old file named with a .sav extension -- good
cheap insurance against power outings &c; you'd need extra
code for that (at least one os.rename!) in version 1.

Warning: both functions are untested -- try them out in a
sandbox before relying on them on your real data!!!


> Why am I doing this you may ask: I have tens of data files that I need to
> strip the first line before I can process it in Matlab.  So, I dream of a
> script that opens every file in a directory, deletes the first line, .....

The fileinput-based approach is GREAT for this, since it ALREADY
works, as written, for whatever number of files -- just call it with
the list of file as argument 'filename'!  glob.glob, sys.argv, or
os.listdir -- whatever gives you a list of filenames may serve.

Magic, innit...?-)


Alex







More information about the Python-list mailing list