Using "with open(filename, 'ab'):" and calling code only if the file is new?

Victor Hooi victorhooi at gmail.com
Tue Oct 29 23:22:43 EDT 2013


Hi,

In theory, it *should* just be our script writing to the output CSV file.

However, I wanted it to be robust - e.g. in case somebody spins up two copies of this script running concurrently.

I suppose the timing would have to be pretty unlucky to hit a race condition there, right?

As in, somebody would have have to open the new file and write to it somewhere in between the check line (os.path.getsize) and the following line (writeheaders).

However, you're saying the only way to be completely safe is some kind of file locking?

Another person (Zachary Ware) suggested using .tell() on the file as well - I suppose that's similar enough to using os.path.getsize(), right?

But basically, I can call .tell() or os.path.getsize() on the file to see if it's zero, and then just call writeheaders() on the following line.

In the future - we may be moving to storing results in something like SQLite, or MongoDB and outputting a CSV directly from there.

Cheers,
Victor

On Wednesday, 30 October 2013 13:55:53 UTC+11, Joseph L. Casale  wrote:
> > Like Victor says, that opens him up to race conditions.
> 
> 
> 
> Slim chance, it's no more possible than it happening in the time try/except
> 
> takes to recover an alternative procedure.
> 
> 
> 
> with open('in_file') as in_file, open('out_file', 'ab') as outfile_file:
> 
>     if os.path.getsize('out_file'):
> 
>         print('file not empty')
> 
>     else:
> 
>         #write header
> 
>         print('file was empty')
> 
> 
> 
> And if that's still not acceptable (you did say new) than open the out_file 'r+' an seek
> 
> and read to check for a header.
> 
> 
> 
> But if your file is not new and lacks a header, then what?
> 
> jlc




More information about the Python-list mailing list