[Python-3000] socket GC worries

Guido van Rossum guido at python.org
Mon Oct 29 19:45:55 CET 2007


2007/10/28, Bill Janssen <janssen at parc.com>:
> > Bill Janssen wrote:
> > > that whole mess of code is a good argument for *not* exposing the
> > > fileno in Python
> >
> > Seems to me that a socket should already *be* a file,
> > so it shouldn't need a makefile() method and you
> > shouldn't have to mess around with filenos.

That model fits TCP/IP streams just fine, but doesn't work so well for
UDP and other odd socket types. The assumption that "s.write(a);
s.write(b) is equivalent to s.write(a+b)", which is fundamental for
any "stream" abstraction, just doesn't work for UDP. Ditto for
reading: AFAIK recv() truncates the rest of an UDP packet.

> I like that model, too.  I also wish the classes in io.py were sort of
> inverted; that is, I'd like to have an IOStream base class with read()
> and write() methods (and maybe close()), which things like Socket
> could inherit from.  FileIO would inherit from IOStream and from
> Seekable, and add a fileno() method and "name" property.  And so
> forth.  But apparently that's out; maybe in Python 4000.

Actually, I'm still up for tweaks to the I/O model if it solves a real
problem, as long as most of the high-level APIs stay the same (there
simply is too much code that expects those to behave a certain way).

I don't quite understand what you mean by inverted though.

> Right now the socket is very much like an OS socket; with "send" and
> "recv" being the star players, not "read" and "write".  socket.makefile
> wraps a buffered file-like interface around it.

I was going to say "we can just replace SocketIO with a non-seekable
_fileio.FileIO instance" until I realized that on Windows, socket fds
and filesystem fds live in different spaces and are managed using
different calls. That may also explain why the inversion you're
looking for doesn't quite work (IIUC what you meant).

The real issue seems to be file descriptor GC. Maybe we haven't
written down the rules clearly enough for when the fd is supposed to
be GC'ed, when there are both a socket and a SocketIO (or more)
referencing it; and whether a close() call means something beyond
dropping the last reference to the object. Or maybe we haven't
implemented the rules right? ISTM that the SocketCloser class is
*intended* to solve these issues. Back to your initial mail (which is
more relevant than Greg Ewing's snipe!):

> I think that the SocketCloser (new in Py3K) was developed to address
> another issue, which is that there's a lot of library code which
> assumes that the Python socket instance is just window dressing over
> an underlying system file descriptor, and isn't important.  In fact,
> that whole mess of code is a good argument for *not* exposing the
> fileno in Python (perhaps only for special cases, like "select").
> Take httplib and urllib, for instance.  HTTPConnection creates a
> "file" from the socket, by calling socket.makefile(), then in some
> cases *closes* the socket (thereby reasonably rendering the socket
> *dead*), *then* returns the "file" to the caller as part of the
> response.  urllib then takes the response, pulls the "file" out of it,
> and discards the rest, returning the "file" as part of an instance of
> addinfourl.  Somewhere along the way some code should call "close()"
> on that HTTPConnection socket, but not till the caller is finished
> using the bytes of the response (and those bytes are kept queued up in
> the real OS socket).  Ideally, GC of the response instance should call
> close() on the socket instance, which means that the instance should
> be passed along as part of the response, IMO.

Hm, I think you're right. The SocketCloser class wasn't written with
the SSL use case in mind. :-( I wonder if one key to solving the
problem isn't to make the socket *wrap* a low-level _socket instance
instead of *being* one (i.e. containment instead of subclassing). Then
the SSL code could be passed the low-level _socket instance and the
high(er)-level socket class could wrap either a _socket or an SSL
instance. The SocketCloser would then be responsible for closing
whatever the socket instance wraps, i.e. either the _socket or the SSL
instance. Then we could have any number of SocketIO instances *plus*
at most one socket instance, and the wrapped thing would be closed
when the last of the higher-level things was either GC'ed or
explicitly closed. If you wanted to reuse the _socket after closing
the SSL instance, you'd have to wrap it in a fresh socket instance.

Does that make sense? (Please do note the difference throughout
between _socket and socket, the former being defined in socketmodule.c
and the latter in socket.py.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list