multiprocessing module and os.close(sys.stdin.fileno())

Joshua Judson Rosen rozzin at geekspace.com
Sat Feb 21 00:20:20 EST 2009


Jesse Noller <jnoller at gmail.com> writes:
>
> On Tue, Feb 17, 2009 at 10:34 PM, Graham Dumpleton
> <Graham.Dumpleton at gmail.com> wrote:
> > Why is the multiprocessing module, ie., multiprocessing/process.py, in
> > _bootstrap() doing:
> >
> >  os.close(sys.stdin.fileno())
> >
> > rather than:
> >
> >  sys.stdin.close()
> >
> > Technically it is feasible that stdin could have been replaced with
> > something other than a file object, where the replacement doesn't have
> > a fileno() method.
> >
> > In that sort of situation an AttributeError would be raised, which
> > isn't going to be caught as either OSError or ValueError, which is all
> > the code watches out for.
> 
> I don't know why it was implemented that way. File an issue on the
> tracker and assign it to me (jnoller) please.

My guess would be: because it's also possible for sys.stdin to be a
file that's open in read+*write* mode, and for that file to have
pending output buffered (for example, in the case of a socketfile).

There's a general guideline, inherited from C, that one should ensure
that the higher-level close() routine is invoked on a given
file-descriptor in at most *one* process after that descriptor has
passed through a fork(); in the other (probably child) processes, the
lower-level close() routine should be called to avoid a
double-flush--whereby buffered data is flushed out of one process, and
then the *same* buffered data is flushed out of the (other)
child-/parent-process' copy of the file-object.

So, if you call sys.stdin.close() in the child-process in
_bootstrap(), then it could lead to a double-flush corrupting output
somewhere in the application that uses the multiprocessing module.

You can expect similar issues with just about /any/ `file-like objects'
that might have `file-like semantics' of buffering data and flushing
it on close, also--because you end up with multiple copies of the same
object in `pre-flush' state, and each copy tries to flush at some point.

As such, I'd recommend against just using .close(); you might use
something like `if hasattr(sys.stdin, "fileno"): ...'; but, if your
`else' clause unconditionally calls sys.stdin.close(), then you still
have double-flush problems if someone's set sys.stdin to a file-like
object with output-buffering.

I guess you could try calling that an `edge-case' and seeing if anyone
screams. It'd be sort-of nice if there was finer granularity in the
file API--maybe if file.close() took a boolean `flush' argument....

-- 
Don't be afraid to ask (Lf.((Lx.xx) (Lr.f(rr)))).



More information about the Python-list mailing list