[Tutor] sorting a 2 gb file

John Purser johnp at milwaukielumber.com
Tue Jan 25 16:21:46 CET 2005


I'll just "Me Too" on Alan's Advice.  I had a similar sized project only it
was binary data in an ISAM file instead of flat ASCII.  I tried several
"pure" python methods and all took forever.  Finally I used Python to
read-modify-input source data into a mysql database.  Then I pulled the data
out via python and wrote it to a new ISAM file.  The whole thing took longer
to code that way but boy it sure scaled MUCH better and was much quicker in
the end.

John Purser

-----Original Message-----
From: tutor-bounces at python.org [mailto:tutor-bounces at python.org] On Behalf
Of Alan Gauld
Sent: Tuesday, January 25, 2005 05:09
To: Scott Melnyk; tutor at python.org
Subject: Re: [Tutor] sorting a 2 gb file

> My data set the below is taken from is over 2.4 gb so speed and
memory
> considerations come into play.

To be honest, if this were my problem, I'd proably dump all the data
into a database and use SQL to extract what I needed. Thats a much
more effective tool for this kind of thing.

You can do it with Python, but I think we need more understanding
of the problem. For example what the various fields represent, how
much of a comparison (ie which fields, case sensitivity etc) leads
to "equality" etc.

Alan G.

_______________________________________________
Tutor maillist  -  Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list