random writing access to a file in Python

Claudio Grondi claudio.grondi at freenet.de
Mon Aug 28 03:00:19 EDT 2006


Dennis Lee Bieber wrote:
> On 27 Aug 2006 15:06:07 -0700, Paul Rubin <http://phr.cx@NOSPAM.invalid>
> declaimed the following in comp.lang.python:
> 
> 
> 
>>I think that's not so bad, though probably still not optimal.  85 GB
>>divided by 18 hours is 1.3 MB/sec, which means if the program is
>>reading the file 8 times, it's getting 10 MB/sec through the Windows
>>file system, which is fairly reasonable throughput.
>>
> 
> 	Especially if, as implied, this was on a USB drive <G>

Don't underestimate external USB drives (I have heard there are great 
differences between them depending on the used controller).

If the file requested is not scattered over the drive due to 
defragmentation and appropriate reading procedure is used I have seen 
(e.g. just yesterday with the 80 Gig file) constant throughput of 28 
MBytes/second what compared to the maximum I have seen on E-IDE of 40 
MBytes/second is not that bad as your posting might suggest.

Thanks to Paul Rubin for the hint on radix sorting (even if coming a bit 
too late). I had used already yesterday this kind of approach in another 
context on the file I was sorting and can therefore estimate the total 
time on my system for such sorting using this method quite well: it will 
take not more than 3 hours (what is a very big improvement compared to 
18 hours). I suppose, that the problem with Windows XP 'sort' is that it 
can't take advantage of the constant record size as there is no option 
available which could be used to pass this hint to it.

"But if you only had to do it once and it's finished now, why do you
still care how long it took?"
Because one of my usual goals going along with doing things like that, 
is to get some feeling for them gaining experience making me in similar 
future cases capable of improvement by _intuitive_ selection of the best 
known to me path to the solution (learning by doing).
It is a big difference between _knowing_ that there are various 
different sorting algorithms and it is necessary to choose the right one 
to speed up sorting and actually _experiencing_ that you have to wait 
for your results 18 hours and the machine is so busy that it is hard to 
use it for other tasks at the same time. If the sorting took less than 
one hour I would probably never make the effort to give it some serious 
thoughts.

Claudio Grondi



More information about the Python-list mailing list