Most efficient way to write data out to a text file?

Thu Jun 27 10:15:33 EDT 2002

Here is my scenario...

1. return 3 columns/per row from a mysql db
2. extract 7 other bits of information from an oracle db
3. stuff the bits into a class which formats the data
4. write the formatted data by returning a concatenated string
   from the class object

I was guessing it was the queries to the oracle db that was causing the slowup,
but my dba was offended and besides, I have several other extraction steps that
run about 5 or 6 times faster (that write to the mysql db from items extracted
from the oracle db).

So the next thought was the file output... could be my class too...  someone
suggested trying the profiler.  I have never used it but I'll give it a shot!

On the flip side... this is my first REAL python project and I am having a LOT
of fun.  I plan on writing an application for my wife and her artist friends
once this one is done and I am learning quite a lot - thank you all for your
patience with all of my questions (here and in the mySQL email list).  Ada used
to be my favorite language, now it is Python all the way (makes working with VB
and Java at work annoying).

Mike J.

On 27 Jun 2002 02:01:35 GMT, bokr at oz.net (Bengt Richter) wrote:

>On Thu, 27 Jun 2002 01:28:22 GMT, candiazoo at attbi.com wrote:
>
>>I assume some sort of block i/o or preallocating a large block of file space
>>would be the best way to do it?  I am outputting about 700,000 records to a text
>>file and my program only spits out about 20 records per second.  This is using
>>standard file open and write...
>>
>>This is on a W2K machine... perhaps I should use the win32file interface?  Any
>>thoughts/opinions?
>>
>I doubt that that is the solution. Where are your "records" coming from and/or
>how are they being generated/mdified, and how big are they?
>
>If you are building string records by s='' followed by s+='chunk' or the like many times,
>this is a typical newbie performance killer. Try accumulating your chunks in a list,
>like sl=[] followed by sl.append('chunk') instead, and then do f.write(''.join(sl))
>instead of the f.write(s) you would have done.
>
>Just guessing...
>
>Try timing generation of output data vs writing it, and see what's happening.
>import time and use clock() to get an accurate floating point seconds time reading. E.g.,
>
>from time import clock
>tstart = clock()
># ... generate an output record here
>tendgen=clock()
># ... write the output  something like f.write(record) here
>tendwrite=clock()
>print """\
>generating took %f sec
>outputting took %f sec""" % (tendgen-tstart, tendwrite-tendgen)
>
>Regards,
>Bengt Richter