Issue 780354: socket.makefile() isn't compatible with marshal/cPickle/etc (original) (raw)
Issue780354
Created on 2003-07-30 17:02 by dmgrime, last changed 2022-04-10 16:10 by admin. This issue is now closed.
Messages (15) | ||
---|---|---|
msg17461 - (view) | Author: David M. Grimes (dmgrime) | Date: 2003-07-30 17:02 |
Even on platforms where the underlying C libraries and associated code in socketmodule.c support it, socket object's .makefile() method no longer return "real" file objects. This breaks support for cPickle, marshal, and any other C modules which expect a file object as an argument. The change initially appears in socket.py v1.36 - the changelog entry states: """ The socket module now always uses the _socketobject wrapper class, even on platforms which have dup(2). The makefile() method is built directly on top of the socket without duplicating the file descriptor, allowing timeouts to work properly. Includes a new test case (urllibnet) which requires the network resource. Closes bug 707074. """ I'm not sure of the implication WRT timeouts, but the patch is very small - quite obviously removing the platform check which prevented the redefinition of the symbol socket in the exports of socket.py. It now is always the _socketobject class, and not the underlying _socket.socket type (when it is known the platform supports all functionality in _socket). For now, I can workaround (since I don't need the timeout semantics) by using _socket.socket() directly (I'm on a Linux platform), but I wondered if this is something which can/should be addressed differently? | ||
msg17462 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2003-08-01 07:14 |
Logged In: YES user_id=21627 See the bug that this patch closes; I doubt there is an alternative approach. Instead, I think the C modules that expect file objects need to be fixed. | ||
msg17463 - (view) | Author: Skip Montanaro (skip.montanaro) * ![]() |
Date: 2003-08-01 14:11 |
Logged In: YES user_id=44345 Looking at the code in cPickle.c it appears that as long as the object being written to has a 'write' method it should do the right thing. Can you provide a simple test case which fails? | ||
msg17464 - (view) | Author: Skip Montanaro (skip.montanaro) * ![]() |
Date: 2003-08-01 14:18 |
Logged In: YES user_id=44345 I modified the Demo/sockets/unixclient.py script to the following: # Echo client demo using Unix sockets # Piet van Oostrum from socket import * FILE = 'blabla' s = socket(AF_UNIX, SOCK_STREAM) s.connect(FILE) import cPickle f = s.makefile("wb") cPickle.dump("Hello, world", f) f.close() #s.send('Hello, world') data = s.recv(1024) s.close() print 'Received', `data` It seemed to work fine when connected to the corresponding unixserver.py script. | ||
msg17465 - (view) | Author: David M. Grimes (dmgrime) | Date: 2003-08-01 14:27 |
Logged In: YES user_id=699265 Skip -- I stand corrected (for cPickle, anyway) - but I imagine there is a performance degredation over the "old" way of doing things - if I get a chance, I'll test that. The problem does still exist at least in marshal (.load and .dump only accept a "real" file object). I did a grep for PyFile_Check in the source tree prior to posting the bug report, and was a little over zealous to include cPickle in the subject. There are other places in the source ( and possibly in 3rd party extensions) which use PyFile_Check, and I haven't had a change to analyze those yet. In any case, marshal doesn't work - in your test, if you change the cPickle to marshal, you'll get: Traceback (most recent call last): File "x.py", line 10, in ? marshal.dump("Hello, world", f) TypeError: marshal.dump() 2nd arg must be file Again, I'm not sure of the impact other places where PyFile_Check is used, and I don't know the performance loss in cPickle (as compared to the "real" file implementation in pre-2.3 socket.makefile()) Let me know if there is anything I can do to help debug/fix! | ||
msg17466 - (view) | Author: Skip Montanaro (skip.montanaro) * ![]() |
Date: 2003-08-01 14:52 |
Logged In: YES user_id=44345 Agreed, marshal has problems. It's problems are a bit deeper than the technique in cPickle can solve, however. It seems to me that the WFILE struct is a bad hack. It should contain a union with three elements, one is the regular FILE * stuff, one is the current write-to-string stuff, and a third is a "generic object which has a write method". I'll see if I can whip something up and pass along to Martin for review. | ||
msg17467 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2003-08-01 19:09 |
Logged In: YES user_id=21627 I don't think anymore that marshal has a problem. There is nothing wrong with it breaking when operating on a socket; marshal is not for general application use, anyway. Anybody who wants to transmit marshalled data over a socket can still dump them into a string first. So I propose to fix this aspect by documenting this intended limitation. | ||
msg17468 - (view) | Author: David M. Grimes (dmgrime) | Date: 2003-08-01 19:31 |
Logged In: YES user_id=699265 It is clear that something which used to work no longer works, and I'm just wondering if the breakage is necessary. I'm not sure what all was involved with respect to timeouts and why the makefile() implementation had to change. I guess I see 3 problems: 1) Previously functional code breaks. While marshal is not "general purpose" in the sense it can't deal with instances, etc. it is still very useful and for data which doesn't contain many redundant references (where cPickle reduces them) it is the most efficient way to serialize and unserialize "native" python types. 2) The performance of everything which uses the results of socket.makefile() now suffers the cost of a pure-python implementation where it had perviously been optimized as a C implementation on platforms which supported it (most). 3) Any other 3rd party C extensions which use the PyFile_Check (and would previously have worked) break. Is there another way to solve the timeout problem? Breaking something which has worked since 1.5 somehow feels wrong to me. | ||
msg17469 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2003-08-01 20:52 |
Logged In: YES user_id=21627 I don't know of any other approach to solve the problem. Also notice that the wrapper object has existed for a long time already, on some systems. | ||
msg17470 - (view) | Author: David M. Grimes (dmgrime) | Date: 2003-08-04 12:37 |
Logged In: YES user_id=699265 I know the wrapper has been there for a long time - some platforms don't support the "native" underlying FILE implementation, but many (most) platforms did. What was the problem that necessitated the change? I'd be more than willing to spend some time looking into alternative solutions which keep the optimized implementation where possible. Can you provide me with some history of the motivations for the current solution? Thanks! | ||
msg17471 - (view) | Author: Skip Montanaro (skip.montanaro) * ![]() |
Date: 2003-08-04 13:53 |
Logged In: YES user_id=44345 The original socket.makefile dup()s the socket to create a new file descriptor. If you have a timeout set on the original socket the new file descriptor loses this property. This was a big problem for the various modules which sit atop socket (httplib, etc) because they generally all just call s.makefile() and then treat the connected socket as a file. These are precisely the modules/classes which benefit most from socket timeouts. Note also that the current Unix implementation is what Windows (and any other platforms without a dup() system call) use to emulate makefile(). In the long run, I see a few other possibilities. Note that I haven't considered the first two in any detail. I'm in the midst of working on the third. 1. Add timeout functionality to the various higher-level modules such as httplib. Once they call s.makefile() they could propagate their current timeout to the new file object. Of course, this means file objects need to be able to timeout (not obvious that this is generally possible). 2. Store timeout information in the socket, so that if it's dup()'d it propagates the timeout to the new descriptor. Again, it's not obvious that this is possible. 3. Extend more general object support to marshal. I am working on this and hope to have something available for review later this week. As Martin indicated, it's not unreasonable to expect people wanting to marshal data to non-file objects to first pass them through marshal.dumps(). You can't, for example, marshal data to StringIO objects directly. | ||
msg17472 - (view) | Author: David M. Grimes (dmgrime) | Date: 2003-08-04 14:34 |
Logged In: YES user_id=699265 Thanks Skip. I see that the ripple effect here is enormous (not surprising, given that socket is used by many standard modules). As for #3 - I understand you can use marshal.dumps() and then sock.sendall() to achieve the write side of a socket-based marshalling transaction, but for a long- lived connection, marshal.load() on the .makefile()'d file of the read-side provided a very nice message-boundary mechanism, i.e. - marshal was the "protocol" on the wire... Obviously (I've already done this in my application) I can just use a thin length header on the wire, but it was just very nice that marshal accomplished this by itself. I can also (since my platform scope is known, controlled, and limited) create a socket.py local to my application which is simply (I don't use getfqdn()): from _socket import * And leave everything else as-is. Again, I appreciate the attention and effort - I think the ramifications of eliminating the "native C" implementation which previously backed .makefile() (where supported) are more than just the impact on marshal - I see potential performance loss for things which rely on it. In any event, generic object support for marshal.load() and .dump() would be great! Thanks again, Dave. | ||
msg17473 - (view) | Author: Skip Montanaro (skip.montanaro) * ![]() |
Date: 2007-03-11 18:45 |
I'm not going to get to this. It may well be obsolete now anyway. Facundo, I'm assigning to you simply because you are diving into the timeout code right now. Maybe you can make a more informed decision. --Skip | ||
msg17474 - (view) | Author: Skip Montanaro (skip.montanaro) * ![]() |
Date: 2007-03-11 18:45 |
Crap. Wrong socket.makefile bug report. | ||
msg81860 - (view) | Author: Daniel Diniz (ajaksu2) * ![]() |
Date: 2009-02-13 03:23 |
IIUC this one can be closed. Objections? |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:10:21 | admin | set | github: 38974 |
2009-02-20 01:53:44 | ajaksu2 | set | status: pending -> closedresolution: not a bugstage: resolved |
2009-02-18 01:52:11 | ajaksu2 | set | status: open -> pendingpriority: normal -> low |
2009-02-13 03:23:21 | ajaksu2 | set | nosy: + ajaksu2type: behaviormessages: + |
2003-07-30 17:02:40 | dmgrime | create |