Using "pickle" for interprocess communication - some notes and things that ought to be documented.

Fri Jan 18 14:59:56 EST 2008

On Jan 17, 2:28 pm, John Nagle <na... at animats.com> wrote:
> It's possible to use "pickle" for interprocess communication over
> pipes, but it's not straightforward.
>
> First, "pickle" output is self-delimiting.
> Each dump ends with ".", and, importantly, "load" doesn't read
> any characters after the "."  So "pickle" can be used repeatedly
> on the same pipe, and one can do repeated message-passing this way.  This
> is a useful, but undocumented, feature.
>
> It almost works.
>
> Pickle's "dump" function doesn't flush output after dumping, so
> there's still some data left to be written.  The sender has to
> flush the underlying output stream after each call to "dump",
> or the receiver will stall. The "dump" function probably ought to flush
> its output file.

But... you can also write multiple pickles to the same file.

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cPickle
>>> f = open('xxx.pkl','wb')
>>> cPickle.dump(1,f)
>>> cPickle.dump('hello, world',f)
>>> cPickle.dump([1,2,3,4],f)
>>> f.close()
>>> f = open('xxx.pkl','rb')
>>> cPickle.load(f)
1
>>> cPickle.load(f)
'hello, world'
>>> cPickle.load(f)
[1, 2, 3, 4]

An automatic flush would be very undesirable there.  Best to let those
worrying about IPC to flush the output file themselves: which they
ought to be doing regardless (either by explicitly flushing or using
an unbuffered stream).

> It's also necessary to call Pickle's "clear_memo" before each "dump"
> call, since objects might change between successive "dump" calls.
> "Unpickle" doesn't have a "clear_memo" function.  It should, because
> if you keep reusing the "Unpickle" object, the memo dictionary
> fills up with old objects which can't be garbage collected.
> This creates a memory leak in long-running programs.

This is all good to know.  I agree that this is a good use case for a
clear_memo on a pickle unloader.

> Then, on Windows, there's a CR LF problem. This can be fixed by
> launching the subprocess with
>
>     proc = subprocess.Popen(launchargs,
>         stdin=subprocess.PIPE, stdout=subprocess.PIPE,
>         universal_newlines=True)
>
> Failure to do this produces the useful error message "Insecure string pickle".
> Binary "pickle" protocol modes won't work at all in this situation; "universal
> newline" translation is compatible, not transparent.  On Unix/Linux, this
> just works, but the code isn't portable.

I would think a better solution would be to use the -u switch to
launch the subprocess, or the PYTHONUNBUFFERED environment variable if
you want to invoke the Python script directly.  It opens up stdin and
stdout in binary, unbuffered mode.

Using "univeral newlines" in a non-text format seems like it's not a
good idea.

For text-format pickles it'd be the right thing, of course.

> Incidentally, in the subprocess, it's useful to do
>
>         sys.stdout = sys.stderr
>
> after setting up the Pickle objects.  This prevents any stray print statements
> from interfering with the structured Pickle output.

Nice idea.

> Then there's end of file detection.  When "load" reaches an end of
> file, it properly raises EOFError.  So it's OK to do "load" after
> "load" until EOFerror is raised.
>
> "pickle" and "cPickle" seem to be interchangeable in this application,
> so that works.
>
> It's a useful way to talk to a subprocess, but you need to know all the
> issues above to make it work.

Thanks: this was an informative post

Carl Banks