multiprocessing module and os.close(sys.stdin.fileno())

Graham Dumpleton Graham.Dumpleton at gmail.com
Sat Feb 21 18:58:45 EST 2009


On Feb 21, 4:20 pm, Joshua Judson Rosen <roz... at geekspace.com> wrote:
> Jesse Noller <jnol... at gmail.com> writes:
>
> > On Tue, Feb 17, 2009 at 10:34 PM, Graham Dumpleton
> > <Graham.Dumple... at gmail.com> wrote:
> > > Why is the multiprocessing module, ie., multiprocessing/process.py, in
> > > _bootstrap() doing:
>
> > >  os.close(sys.stdin.fileno())
>
> > > rather than:
>
> > >  sys.stdin.close()
>
> > > Technically it is feasible that stdin could have been replaced with
> > > something other than a file object, where the replacement doesn't have
> > > a fileno() method.
>
> > > In that sort of situation an AttributeError would be raised, which
> > > isn't going to be caught as either OSError or ValueError, which is all
> > > the code watches out for.
>
> > I don't know why it was implemented that way. File an issue on the
> > tracker and assign it to me (jnoller) please.
>
> My guess would be: because it's also possible for sys.stdin to be a
> file that's open in read+*write* mode, and for that file to have
> pending output buffered (for example, in the case of a socketfile).

If you are going to have a file that is writable as well as readable,
such as a socket, then likely that sys.stdout/sys.stderr are going to
be bound to it at the same time. If that is the case then one should
not be using close() at all as it will then also close the write side
of the pipe and cause errors when code subsequently attempts to write
to sys.stdout/sys.stderr.

In the case of socket you would actually want to use shutdown() to
close just the input side of the socket.

What this all means is that what is the appropriate thing to do is
going to depend on the environment in which the code is used. Thus,
having the behaviour hard wired a certain way is really bad. There
perhaps instead should be a way of a user providing a hook function to
be called to perform any case specific cleanup of stdin, stdout and
stderr, or otherwise reassign them.

That this is currently in the _bootstrap() function, which does other
important stuff, doesn't exactly make it look like it is easily
overridden to work for a specific execution environment which is
different to the norm.

> There's a general guideline, inherited from C, that one should ensure
> that the higher-level close() routine is invoked on a given
> file-descriptor in at most *one* process after that descriptor has
> passed through a fork(); in the other (probably child) processes, the
> lower-level close() routine should be called to avoid a
> double-flush--whereby buffered data is flushed out of one process, and
> then the *same* buffered data is flushed out of the (other)
> child-/parent-process' copy of the file-object.
>
> So, if you call sys.stdin.close() in the child-process in
> _bootstrap(), then it could lead to a double-flush corrupting output
> somewhere in the application that uses the multiprocessing module.
>
> You can expect similar issues with just about /any/ `file-like objects'
> that might have `file-like semantics' of buffering data and flushing
> it on close, also--because you end up with multiple copies of the same
> object in `pre-flush' state, and each copy tries to flush at some point.
>
> As such, I'd recommend against just using .close(); you might use
> something like `if hasattr(sys.stdin, "fileno"): ...'; but, if your
> `else' clause unconditionally calls sys.stdin.close(), then you still
> have double-flush problems if someone's set sys.stdin to a file-like
> object with output-buffering.
>
> I guess you could try calling that an `edge-case' and seeing if anyone
> screams. It'd be sort-of nice if there was finer granularity in the
> file API--maybe if file.close() took a boolean `flush' argument....

Graham



More information about the Python-list mailing list