correct way to catch exception with Python 'with' statement

Ned Batchelder ned at nedbatchelder.com
Thu Dec 1 22:39:57 EST 2016


On Thursday, December 1, 2016 at 7:26:18 PM UTC-5, DFS wrote:
> On 12/01/2016 06:48 PM, Ned Batchelder wrote:
> > On Thursday, December 1, 2016 at 2:31:11 PM UTC-5, DFS wrote:
> >> After a simple test below, I submit that the above scenario would never
> >> occur.  Ever.  The time gap between checking for the file's existence
> >> and then trying to open it is far too short for another process to sneak
> >> in and delete the file.
> >
> > It doesn't matter how quickly the first operation is (usually) followed
> > by the second.  Your process could be swapped out between the two
> > operations. On a heavily loaded machine, there could be a very long
> > time between them
> 
> 
> How is it possible that the 'if' portion runs, then 44/100,000ths of a 
> second later my process yields to another process which deletes the 
> file, then my process continues.

A modern computer is running dozens or hundreds (or thousands!) of
processes "all at once". How they are actually interleaved on the
small number of actual processors is completely unpredictable. There
can be an arbitrary amount of time passing between any two processor
instructions.

I'm assuming you've measured this program on your own computer, which
was relatively idle at the moment.  This is hardly a good stress test
of how the program might execute under more burdened conditions.

> 
> Is that governed by the dreaded GIL?
> 
> "The mechanism used by the CPython interpreter to assure that only one 
> thread executes Python bytecode at a time."
> 
> But I see you posted a stack-overflow answer:
> 
> "In the case of CPython's GIL, the granularity is a bytecode 
> instruction, so execution can switch between threads at any bytecode."
> 
> Does that mean "chars=f.read().lower()" could get interrupted between 
> the read() and the lower()?

Yes.  But even more importantly, the Python interpreter is itself a
C program, and it can be interrupted between any two instructions, and
another program on the computer could run instead.  That other program
can fiddle with files on the disk.

> 
> I read something interesting last night:
> https://www.jeffknupp.com/blog/2012/03/31/pythons-hardest-problem/
> 
> "In the new GIL, a hard timeout is used to instruct the current thread 
> to give up the lock. When a second thread requests the lock, the thread 
> currently holding it is compelled to release it after 5ms (that is, it 
> checks if it needs to release it every 5ms)."
> 
> With a 5ms window, it seems the following code would always protect the 
> file from being deleted between lines 4 and 5.
> 
> --------------------------------
> 1 import os,threading
> 2 f_lock=threading.Lock()
> 3 with f_lock:
> 4   if os.path.isfile(filename):
> 5     with open(filename,'w') as f:
> 6       process(f)
> --------------------------------
> 

You seem to be assuming that the program that might delete the file
is the same program trying to read the file.  I'm not assuming that.
My Python program might be trying to read the file at the same time
that a cron job is running a shell script that is trying to delete
the file.

> Also, this is just theoretical (I hope).  It would be terrible system 
> design if all those dozens of processes were reading and writing and 
> deleting the same file.

If you can design your system so that you know for sure no one else
is interested in fiddling with your file, then you have an easier
problem.  So far, that has not been shown to be the case. I'm
talking more generally about a program that can't assume those
constraints.

--Ned.



More information about the Python-list mailing list