why huge speed difference btwn 1.52 and 2.1?

John Machin machin_john_888 at hotmail.com
Wed Jun 6 11:28:34 EDT 2001


aahz at panix.com (Aahz Maruch) wrote in message news:<9fj4tn$nr0$1 at panix6.panix.com>...
> In article <3b1bcf7a.12528354 at news.ccs.queensu.ca>,
> Robin Senior <rsenior at hotmail.com> wrote:
> >
			filename = string.replace(state, ' ', '_')
> >				g = open('states/'+filename+'/'+str(year)+'.TXT', "a")
> >				g.write(line)
> >				g.write("\n")
> >				g.close()
> >	f.close()
> 
> In addition to the other comments you've received, you should probably
> move the g.open() outside this loop.

Presuming the number of lines of output data is big enough for you to
care about how long it takes to run: any procedure that has a file
open(), write(only_one_line) and close() inside its innermost loop
needs serious reworking irrespective of the language used and the
version thereof.

If you are running an operating system that cares, you will be hit by
multiple physical disk writes per output *line* as the close() flushes
your file and its directory entry. If there are many more lines than
one to be appended to each output file, you can do considerably
better.

So, if you have enough real memory, save your output data in an
in-memory data structure -- for example, (a) dictionary where the key
is (state_abbrev, year) and the value is a list of the relevant lines.
Then for each dict entry: open(), write() once per list entry,
close(). (b) list where each entry is (state, year, relevant_line). At
end, sort the list then output it, opening each output file only once
(when (state, year) changes). This will most likely take less memory
and more time than the dictionary option --- but [pax Tim] don't
believe me; write it both ways and benchmark it.

If you don't have enough real memory [or your OS's memory allocator
goes berserk when you create a large list by appending; see thread of
about two weeks ago] either (a) buy more or (b) [very similar to
option (b) above] write the output lines out to a single file, with
state & year up the front of each line, sort the file using your OS's
sort utility program, then read the sorted file back, opening each
output file once only.

HTH,

John Machin



More information about the Python-list mailing list