random writing access to a file in Python

Claudio Grondi claudio.grondi at freenet.de
Fri Aug 25 16:55:59 EDT 2006


Dennis Lee Bieber wrote:
> On Fri, 25 Aug 2006 16:39:14 +0200, Claudio Grondi
> <claudio.grondi at freenet.de> declaimed the following in comp.lang.python:
> 
> 
>>The core of my problem was ... trying to use 'wb' or 'w+b' ... (stupid 
>>me ...)
> 
> 
> 	Ouch... How many times did you have to restore that massive file
> from backup?
> 
I was smart enough to try it first on a very small file wondering what 
was happening. Python documentation and even Google search after 'random 
file access in Python' were not helpful as there was no example and no 
hint available.

The only hint about random file access in Python I found with Google was 
Table of Contents of "Python Cookbook" from O'Railly:
   http://www.oreilly.com/catalog/pythoncook2/toc.html
and hints about random reading access.

I was stupid enough to forget about 'r+' (used it many times before in 
C/C++ a decade ago, but not yet in Python) thinking just too much the 
Pythonic way:

  ===============================================================
  if I want to write, I don't open for reading (plus or not plus)
  ===============================================================

Actually my file was 'only' 42 GByte, but I wanted to construct the 
question making it impossible to suggest use of an intermediate file.

In between I have chosen a total new approach as random writing to hard 
disk seems to actually move the disk head each time when seeking, so 
apparently no cache is used sorting a bit the pieces to write to the 
disk, so if there are many of them there is no other chance as to try to 
put them together in memory first before writing them to the file. This 
makes the straightforward intuitive programming a bit complicated 
because to work on large files it is necessary to work in chunks and 
waste some processing results when they don't fill the gaps. I suppose I 
am still not on the right path, so by the way:

Is there a ready to use (free, best Open Source) tool able to sort lines 
(each line appr. 20 bytes long) of a XXX GByte large text file (i.e. in 
place) taking full advantage of available memory to speed up the process 
as much as possible?

Claudio Grondi



More information about the Python-list mailing list