Reading a file and resuming reading.

Hendrik van Rooyen mail at microcorp.co.za
Sat May 26 03:13:09 EDT 2007


 "Karim Ali" <k,,z at h..l.com> wrote: 


> Hi,
> 
> Simple question. Is it possible in python to write code of the type:
> 
> -----------------------------
> while not eof  <- really want the EOF and not just an empty line!

readline() reads to the next newline - an empty line *is* EOF -
a blank line has at least a newline.

>     readline by line
> end while;
> -----------------------------
> 
> What I am using now is the implicit for loop after a readlines(). I don't 
> like this at all as a matter of opinion (my background is C++).
> 

use readline() in a while true loop, or iterate over the file.

> But also, in case for one reason or another the program crashes, I want to 
> be able to rexecute it and for it to resume reading from the same position 
> as it left. If a while loop like the one above can be implemented I can do 
> this simply by counting the lines!

Biggest problem is saving the count so that you can get at it again after
the crash

You could write to a file and flush and close after every line, but if there 
is a crash, you have no guarantee that you will get the exact place back, 
and whatever you are doing with the lines could get a duplicate or two 
if you recover automatically after the crash. - it depends on what your OS
does to keep buffers in ram and data on disk congruent.

Overwriting the file like this is dangerous also as the crash could come
after the OS has deleted the directory entry, before making the new
file, so you could end with nothing saved...

You could also keep a more detailed journal, with entries like:

I am going to read line 0
I have successfully read line 0
I have processed line 0
I am going to write the output of line 0
I have written the output of line 0
I am going to read line 1

etcetera...

You do this by opening and closing the journal file in append mode
for every entry in it.

Even with this, it can get tricky, because there is the same uncertainty
referred to earlier - but you could get close.

Have a look at databases and commit - they are made to solve this exact 
problem.

Is there not another way to see how far you actually got, by examining
the output?

Is it not easier to simply start again and do the whole job over, after
turfing the output?

Why bother about it at all - if you can use readlines, it means the
file fits into memory now - so it is not humongous, so it is probably not
"that" expensive to just do it all over.

I suppose it depends on what it is you are actually trying to achieve.

HTH 

- Hendrik




More information about the Python-list mailing list