writing to file very slow - solved

Thu Mar 27 05:12:33 EST 2003

Doug Quale said:

> sjmachin at lexicon.net (John Machin) writes:
>
>> Below I have rewritten the whole function, using list comprehensions
>> and avoiding range() as much as possible. Together with some cosmetics
>> like spaces around "=" and more than 2 chars of indent, plus caching
>> some attribute look-ups, it should now be fast enough and legible
>> enough -- even looks like it was written in a HLL like Python :-)
>
> This is very good, but I think it can be just a little bit better.
> There's actually no need for any range() or xrange() at all.  We want
> to iterate through the lists and tuples, not subscript them.  The
> listfields() method returns a list of field names, so fieldname()
> isn't needed either.  I changed the x_result variable name to row,
> since we might as well call a row a row (and row is easier to type).
> The str() is needed if we are to join() the results since the DB api
> maps SQL types to Python types.  You can expect to see numbers and
> strings at the minimum, and possibly booleans and dates/times.  This
> is an example where map() is shorter than list comprehension.
>
> def fichier_resultats(results):
>     """Write DB rows in results to a |-delimited tempfile and return
> the filename."""
>     tfilename = tempfile.mktemp('rec.txt')
>     f = open(tfilename,'w')
>
>     bar_join = '|'.join
>
>     # Write the field names first.
>     f.write(bar_join(results.listfields()))
>     f.write('\n')
>
>     # results.getresult() is a list of tuples.  Each tuple is a DB row.
>     allrows = results.getresult()
>     for row in allrows:
>         f.write(bar_join(map(str, row)))
>         f.write('\n')
>
>     f.close()
>     return(tfilename)
>
> If this is still too slow it may be because the entire result is
> stored as a list of tuples.  Memory usage depends on the length of the
> rows, but as an example 4000 rows * 4000 bytes/row would be almost
> 16M.  (Actually 16M is quite reasonable for most modern computers, but
> of course it could be much larger.)
>
> Instead of using the low level _pg module the original poster could
> consider using a different Python Postgresql interface.  The _pg
> module also comes with pgdb which provides the Python DB-API 2.0
> interface which is slightly higher level and easer to use.  It allows
> fetching the results in smaller chunks if required.  (See
> http://www.python.org/topics/database/modules.html.)
> --

This is perfect: Writing the file of my "benchmark" query now takes 10
seconds instead of 3 1/2 - 4 minutes !

Thanks a lot to all those who responded. I'm just at the beginning of
learning how to use python, and, as someone mentioned, I have to get used
to the very high level nature of it. Your help was very precious.

Moritz