random writing access to a file in Python

Claudio Grondi claudio.grondi at freenet.de
Sat Aug 26 19:55:25 EDT 2006


Paul Rubin wrote:
> Claudio Grondi <claudio.grondi at freenet.de> writes:
> 
>>>Try the standard Unix/Linux sort utility.  Use the --buffer-size=SIZE
>>>to tell it how much memory to use.
>>
>>I am on Windows and it seems, that Windows XP SP2 'sort' can work with
>>the file, but not without a temporary file and space for the resulting
>>file,  so triple of space of the file to sort must be provided.
> 
> 
> Oh, sorry, I didn't see the earlier parts of the thread.  Anyway,
> depending on the application, it's probably not worth the hassle of
> coding something yourself, instead of just throwing more disk space at
> the Unix utility.  But especially if the fields are fixed size, you
> could just mmap the file and then do quicksort on disk.  Simplest
> would be to just let the OS paging system take care of caching stuff
> if you wanted to get fancy, you could sort in memory once the sorting
> regions got below a certain size.
> 
> A huge amount of stuff has been written (e.g. about half of Knuth vol
> 3) about how to sort.  Remember too, that traditionally large-scale
> sorting was done on serial media like tape drives, so random access
> isn't that vital.

Does it mean, that in case of very large files:
   the size of available memory for the sorting operation (making it 
possible to work on larger chunks of data in memory) has less impact on 
the actual sorting speed than
   the speed of the data transfer from/to storage device(s)
?
So, that the most effective measure towards shortening the time required 
for sorting very large files were to use faster hard drives (e.g. 10.000 
rpm instead of 7.600 rpm) and faster interfaces for the data transfer 
(e.g. E-IDE or S-ATA instead of USB), right?

Claudio Grondi



More information about the Python-list mailing list