NEWB: reverse traversal of xml file

Serge Orlov Serge.Orlov at gmail.com
Wed May 24 00:45:05 EDT 2006


manstey wrote:
> But will this work if I don't know parts in advance.

Yes it will work as long as the highest part number in the whole file
is not very high. The algorithm needs only store N records in memory,
where N is the highest part number in the whole file.

> I only know parts
> by reading through the file, which has 450,000 lines.

Lines or records? I created a sequence of 10,000,000 numbers which is
equal to your ten million records like this:

def many_numbers():
    for n in xrange(1000000):
        for part in xrange(10):
            yield part
parts = many_numbers()

and the code processed it consuming virtually no memory in 13 seconds.
That is the advantage of iterators and generators, you can process long
sequences without allocating a lot of memory.




More information about the Python-list mailing list