Sending binary pickled data through TCP

Steve Holden steve at holdenweb.com
Sat Oct 14 04:05:34 EDT 2006


David Hirschfield wrote:
> Thanks for the great response.
> 
> Yeah, by "safe" I mean that it's all happening on an intranet with no 
> chance of malicious individuals getting access to the stream of data.
> 
> The chunks are arbitrary collections of python objects. I'm wrapping 
> them up a little, but I don't know much about the actual formal makeup 
> of the data, other than it pickles successfully.
> 
> Are there any existing python modules that do the equivalent of pickling 
> on arbitrary python data, but do it a lot faster? I wasn't aware of any 
> that are as easy to use as pickle, or don't require implementing them 
> myself, which is not something I have time for.
> 
Marshal may achieve what you want, but on a more limited range of 
datatypes than pickle.

regards
  Steve


> Thanks again,
> -Dave
> 
> Steve Holden wrote:
> 
>>David Hirschfield wrote:
>>  
>>
>>>I have a pair of programs which trade python data back and forth by 
>>>pickling up lists of objects on one side (using 
>>>pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket 
>>>connection to the receiver, who unpickles the data and uses it.
>>>
>>>So far this has been working fine, but I now need a way of separating 
>>>multiple chunks of pickled binary data in the stream being sent back and 
>>>forth.
>>>
>>>Questions:
>>>
>>>Is it safe to do what I'm doing? I didn't think there was anything 
>>>fundamentally wrong with sending binary pickled data, especially in the 
>>>closed, safe environment these programs operate under...but maybe I'm 
>>>making a poor assumption?
>>>
>>>    
>>>
>>If there's no chance of malevolent attackers modifying the data stream 
>>then you can safely ignore the otherwise dire consequences of unpickling 
>>arbitrary chunks of data.
>>
>>  
>>
>>>I was going to separate the chunks of pickled data with some well-formed 
>>>string, but couldn't that string potentially randomly appear in the 
>>>pickled data? Do I just pick an extremely 
>>>unlikely-to-be-randomly-generated string as the separator? Is there some 
>>>string that will definitely NEVER show up in pickled binary data?
>>>
>>>    
>>>
>>I presumed each chunk was of a know structure. Couldn't you just lead of 
>>with a pickled integer saying how many chunks follow?
>>
>>  
>>
>>>I thought about base64 encoding the data, and then decoding on the 
>>>opposite side (like what xmlrpclib does), but that turns out to be a 
>>>very expensive operation, which I want to avoid, speed is of the essence 
>>>in this situation.
>>>
>>>    
>>>
>>Yes, base64 stuffs three bytes into four (six bits per byte) giving you 
>>a 33% overhead. Having said that, pickle isn't all that efficient a 
>>representation because it's designed to be portable. If you are using 
>>machines of the same type there are almost certainly faster binary 
>>encodings.
>>
>>  
>>
>>>Is there a reliable way to determine the byte count of some pickled 
>>>binary data? Can I rely on len(<pickled data>) == bytes?
>>>
>>>    
>>>
>>Yes, since pickle returns a string of bytes, not a Unicode object.
>>
>>If bandwidth really is becoming a limitation you might want to consider 
>>uses of the struct module to represent things more compactly (but this 
>>may be too difficult if the objects being exchanged are at all complex).
>>
>>regards
>>  Steve
>>  
>>
> 
> -- 
> Presenting:
> mediocre nebula.
> 


-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden



More information about the Python-list mailing list