[ python-Bugs-780354 ] socket.makefile() isn't compatible with marshal/cPickle/etc

SourceForge.net noreply at sourceforge.net
Sun Mar 11 19:45:53 CET 2007


Bugs item #780354, was opened at 2003-07-30 12:02
Message generated for change (Comment added) made by montanaro
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=780354&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David M. Grimes (dmgrime)
>Assigned to: Nobody/Anonymous (nobody)
Summary: socket.makefile() isn't compatible with marshal/cPickle/etc

Initial Comment:
Even on platforms where the underlying C libraries and associated 
code in
socketmodule.c support it, socket object's .makefile() method no 
longer
return "real" file objects.  This breaks support for cPickle, marshal, 
and
any other C modules which expect a file object as an argument.

The change initially appears in socket.py v1.36 - the changelog 
entry
states:

"""

The socket module now always uses the _socketobject wrapper 
class, even on
platforms which have dup(2).  The makefile() method is built directly 
on top
of the socket without duplicating the file descriptor, allowing timeouts 
to
work properly.  Includes a new test case (urllibnet) which requires 
the
network resource.

Closes bug 707074.
"""

I'm not sure of the implication WRT timeouts, but the patch is very 
small -
quite obviously removing the platform check which prevented the 
redefinition
of the symbol socket in the exports of socket.py.  It now is always 
the
_socketobject class, and not the underlying _socket.socket type 
(when it is
known the platform supports all functionality in _socket).

For now, I can workaround (since I don't need the timeout 
semantics) by
using _socket.socket() directly (I'm on a Linux platform), but I 
wondered if
this is something which can/should be addressed differently?



----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2007-03-11 13:45

Message:
Logged In: YES 
user_id=44345
Originator: NO

Crap.  Wrong socket.makefile bug report.


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2007-03-11 13:45

Message:
Logged In: YES 
user_id=44345
Originator: NO

I'm not going to get to this.  It may well be obsolete now anyway. 
Facundo, I'm assigning to you simply because you are diving into the
timeout code right now.  Maybe you can make a more informed decision. 
--Skip


----------------------------------------------------------------------

Comment By: David M. Grimes (dmgrime)
Date: 2003-08-04 09:34

Message:
Logged In: YES 
user_id=699265

Thanks Skip.  I see that the ripple effect here is enormous 
(not surprising, given that socket is used by many standard 
modules).  As for #3 - I understand you can use 
marshal.dumps() and then sock.sendall() to achieve the write 
side of a socket-based marshalling transaction, but for a long-
lived connection, marshal.load() on the .makefile()'d file of the 
read-side provided a very nice message-boundary mechanism, 
i.e. - marshal was the "protocol" on the wire...

Obviously (I've already done this in my application) I can just 
use a thin length header on the wire, but it was just very nice 
that marshal accomplished this by itself.  I can also (since my 
platform scope is known, controlled, and limited) create a 
socket.py local to my application which is simply (I don't use 
getfqdn()):

from _socket import *

And leave everything else as-is.  Again, I appreciate the 
attention and effort - I think the ramifications of eliminating 
the "native C" implementation which previously 
backed .makefile() (where supported) are more than just the 
impact on marshal - I see potential performance loss for 
things which rely on it.  In any event, generic object support 
for marshal.load() and .dump() would be great!

Thanks again, Dave.


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-08-04 08:53

Message:
Logged In: YES 
user_id=44345

The original socket.makefile dup()s the socket to create a new
file descriptor.  If you have a timeout set on the original socket
the new file descriptor loses this property.  This was a big
problem for the various modules which sit atop socket (httplib,
etc) because they generally all just call s.makefile() and then
treat the connected socket as a file.  These are precisely the
modules/classes which benefit most from socket timeouts.

Note also that the current Unix implementation is what Windows
(and any other platforms without a dup() system call) use to
emulate makefile().

In the long run, I see a few other possibilities.  Note that I
haven't considered the first two in any detail.  I'm in the midst
of working on the third.

1. Add timeout functionality to the various higher-level modules
such as httplib.  Once they call s.makefile() they could propagate
their current timeout to the new file object.  Of course, this
means file objects need to be able to timeout (not obvious that
this is generally possible).

2. Store timeout information in the socket, so that if it's dup()'d
it propagates the timeout to the new descriptor.  Again, it's not
obvious that this is possible.

3. Extend more general object support to marshal.  I am working
on this and hope to have something available for review later
this week.  As Martin indicated, it's not unreasonable to expect
people wanting to marshal data to non-file objects to first pass
them through marshal.dumps().  You can't, for example,
marshal data to StringIO objects directly.
  

----------------------------------------------------------------------

Comment By: David M. Grimes (dmgrime)
Date: 2003-08-04 07:37

Message:
Logged In: YES 
user_id=699265

I know the wrapper has been there for a long time - some 
platforms don't support the "native" underlying FILE 
implementation, but many (most) platforms did.

What was the problem that necessitated the change?  I'd be 
more than willing to spend some time looking into alternative 
solutions which keep the optimized implementation where 
possible.  Can you provide me with some history of the 
motivations for the current solution?

Thanks!

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-08-01 15:52

Message:
Logged In: YES 
user_id=21627

I don't know of any other approach to solve the problem.

Also notice that the wrapper object has existed for a long
time already, on some systems.

----------------------------------------------------------------------

Comment By: David M. Grimes (dmgrime)
Date: 2003-08-01 14:31

Message:
Logged In: YES 
user_id=699265

It is clear that something which used to work no longer 
works, and I'm just wondering if the breakage is necessary.  
I'm not sure what all was involved with respect to timeouts 
and why the makefile() implementation had to change.  I 
guess I see 3 problems:

1) Previously functional code breaks.  While marshal is 
not "general purpose" in the sense it can't deal with 
instances, etc. it is still very useful and for data which 
doesn't contain many redundant references (where cPickle 
reduces them) it is the most efficient way to serialize and 
unserialize "native" python types.

2) The performance of everything which uses the results of 
socket.makefile() now suffers the cost of a pure-python 
implementation where it had perviously been optimized as a C 
implementation on platforms which supported it (most).

3) Any other 3rd party C extensions which use the 
PyFile_Check (and would previously have worked) break.

Is there another way to solve the timeout problem?  Breaking 
something which has worked since 1.5 somehow feels wrong 
to me.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-08-01 14:09

Message:
Logged In: YES 
user_id=21627

I don't think anymore that marshal has a problem. There is
nothing wrong with it breaking when operating on a socket;
marshal is not for general application use, anyway. Anybody
who wants to transmit marshalled data over a socket can
still dump them into a string first.

So I propose to fix this aspect by documenting this intended
limitation.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-08-01 09:52

Message:
Logged In: YES 
user_id=44345

Agreed, marshal has problems.  It's problems are a bit
deeper than the technique in cPickle can solve, however.  It
seems to me that the WFILE struct is a bad hack.  It should
contain a union with three elements, one is the regular FILE *
stuff, one is the current write-to-string stuff, and a third is
a "generic object which has a write method".

I'll see if I can whip something up and pass along to Martin
for review.


----------------------------------------------------------------------

Comment By: David M. Grimes (dmgrime)
Date: 2003-08-01 09:27

Message:
Logged In: YES 
user_id=699265

Skip --

I stand corrected (for cPickle, anyway) - but I imagine there is a 
performance degredation over the "old" way of doing things - if
I get a 
chance, I'll test that.  

The problem does still exist at least in marshal (.load and .dump only 
accept a "real" file object).  I did a grep for PyFile_Check in
the source tree 
prior to posting the bug report, and was a little over zealous to include

cPickle in the subject.  There are other places in the source ( and
possibly 
in 3rd party extensions) which use PyFile_Check, and I haven't had a 
change to analyze those yet.

In any case, marshal doesn't work - in your test, if you change the
cPickle 
to marshal, you'll get:

Traceback (most recent call last):
  File "x.py", line 10, in ?
    marshal.dump("Hello, world", f)
TypeError: marshal.dump() 2nd arg must be file

Again, I'm not sure of the impact other places where PyFile_Check is 
used, and I don't know the performance loss in cPickle (as compared to 
the "real" file implementation in pre-2.3 socket.makefile())

Let me know if there is anything I can do to help debug/fix!



----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-08-01 09:18

Message:
Logged In: YES 
user_id=44345

I modified the Demo/sockets/unixclient.py script to the following:

# Echo client demo using Unix sockets
# Piet van Oostrum
from socket import *
FILE = 'blabla'
s = socket(AF_UNIX, SOCK_STREAM)
s.connect(FILE)
import cPickle
f = s.makefile("wb")
cPickle.dump("Hello, world", f)
f.close()
#s.send('Hello, world')
data = s.recv(1024)
s.close()
print 'Received', `data`

It seemed to work fine when connected to the corresponding
unixserver.py script.


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-08-01 09:11

Message:
Logged In: YES 
user_id=44345

Looking at the code in cPickle.c it appears that as long as the
object being written to has a 'write' method it should do the
right thing.  Can you provide a simple test case which fails?


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-08-01 02:14

Message:
Logged In: YES 
user_id=21627

See the bug that this patch closes; I doubt there is an
alternative approach.

Instead, I think the C modules that expect file objects need
to be fixed.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=780354&group_id=5470


More information about the Python-bugs-list mailing list