random writing access to a file in Python
Claudio Grondi
claudio.grondi at freenet.de
Sat Aug 26 18:19:14 EDT 2006
Paul Rubin wrote:
> Claudio Grondi <claudio.grondi at freenet.de> writes:
>
>>Is there a ready to use (free, best Open Source) tool able to sort
>>lines (each line appr. 20 bytes long) of a XXX GByte large text file
>>(i.e. in place) taking full advantage of available memory to speed up
>>the process as much as possible?
>
>
> Try the standard Unix/Linux sort utility. Use the --buffer-size=SIZE
> to tell it how much memory to use.
I am on Windows and it seems, that Windows XP SP2 'sort' can work with
the file, but not without a temporary file and space for the resulting
file, so triple of space of the file to sort must be provided.
Windows XP 'sort' uses constantly appr. 300 MByte of memory and can't
use 100% of CPU all the time, probably due to I/O operations via USB (25
MByte/s experienced top data transfer speed).
I can't tell yet if it succeeded as the sorting of the appr. 80 GByte
file with fixed length records of 20 bytes is still in progress (for
eleven CPU time hours / 18 daytime hours).
I am not sure if own programming would help in my case to be much faster
than the systems own sort (I haven't tried yet to set the size of memory
to use in the options to e.g 1.5 GByte as the sort help tells it is
better not to specify it). My machine is a Pentium 4, 2.8 GHz with 2.0
GByte RAM.
I would be glad to hear if the time required for sorting I currently
experience is as expected for such kind of task or is there still much
space for improvement?
Claudio Grondi
More information about the Python-list
mailing list