Improve performance for writing files with format modification

Chris Barker chrishbarker at attbi.com
Thu Dec 20 13:42:14 EST 2001


"christine.bartels at teleatlas.com" wrote:

> I want to use python but at the moment the write-function takes so
> much time that I prefer gawk for this conversion (3 times faster!).

You must have one huge number of huge files for this to take more time
to run than it takes you to fiddle with your script! Or a very slow
machine.

Anyway, you are probably never going to get Python to be as fast as
gawk. It is a general purpose tool, not nearly as specialized as gawk,
and as such is not quite as efficient for simple text crunching.

Andrew Dalke wrote:

> In other words, that the inner loop is written
> 
>     for i in fblock:
>         nfile.write("%20s%10s\n" % tuple(string.split(i, ';')))

or:
     for i in fblock:
         nfile.write("%20s%10s\n" % tuple(i.split(';')))

> Try this
> 
> def convert(file, nfile):
>     write = nfile.write  # cache the attribute lookup to a local variable
>     tupl = tuple  # cache the __builtin__ lookup to a local variable
>     splt = split  # cache the module lookup to a local variable

shouldn't this be: 

      splt = string.split


>     while 1:
>         fblock = file.readlines(0x2000)

you seem to be reading in only about 8k of data here at a time (probably
more because readlines() rounds up to an internal buffer size). That is
a minuscule amount on today's machines! It would probably run faster if
you added a few zeros there.

>         if not fblock:
>             break
>         for i in fblock:
>             write("%20s%10\n" % tupl(splt(i, ';')))
> 
> file = open(filein,"r")
> nfile = open(fileout,"w")
> convert(file, nfile)
> file.close()
> nfile.close()

I'd be interested to hear how much of a difference this makes...

-Chris



-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker at attbi.net                ---           ---           ---
                                     ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------



More information about the Python-list mailing list