The Art of Pickling: Binary vs Ascii difficulties

Scott David Daniels Scott.Daniels at Acm.Org
Thu Oct 14 17:23:30 EDT 2004


Bix wrote:
> As this is my very first post, I'd like to give thanks to all who
> support this with their help.  Hopefully, this question hasn't been
> answered (too many times) before...
 > If anyone could explain this behavior, I'd greatly appreciate it.

You clearly spent some effort on this, but you could have boiled this
down to a smaller, more direct question.

The short answer is, "when reading and/or writing binary data,
the files must be opened in binary."  Pickles in "ascii" are not
in a binary format, but the others are.

The longer answer includes:
You should handle files a bit more carefully.  Don't presume they get
automatically get closed.
I'd change:
 >    fn = 'out.bin'
 >    pickle.Pickler(open(fn,'w'),fmt).dump(w)
 >    obj = pickle.Unpickler(open(fn,'r')).load()
to:
     fn = 'out.bin'
     dest = open(fn, 'w')
     try:
         pickle.Pickler(dest, fmt).dump(w)
     finally:
         dest.close()
     source = open(fn, 'r')
     try:
         return pickle.Unpickler(source).load()
     finally:
         source.close()

Then the problem (the mode in which you open the file) shows up to a
practiced eye.
     dest = open(fn, 'w') ... source = open(fn, 'r')
should either be:
     dest = open(fn, 'wb') ... source = open(fn, 'rb')
which works "OK" for ascii, but is not in machine-native text format.
or:
     if fmt:
         readmode, writemode = 'rb', 'wb'
     else:
         readmode, writemode = 'r', 'b'
     ...
     dest = open(fn, writemode) ... source = open(fn, readmode)

By the way, the reason that binary mode sometimes works (which is,
I suspect, what is troubling you), is that not all bytes are necessarily
written out as-is in text mode.  On Windows and MS-DOS systems,
a byte with value 10 is written as a pair of bytes, 13 followed by 10.
On Apple systems, another translation happens.  On unix (and hence
linux) there is no distinction between data written as text and the
C representation of '\n' for line breaks.  This means nobody on linux
who ran your example saw a problem, I suspect.

This C convention is a violation of the ASCII code as it was then
defined, in order to save a byte per line (treating '\n' as end-of-line,
not line-feed).  An ASCII-conforming printer when fed 'a\nb\nc\r\n.\r\n'
should print:
a
  b
   c
.

My idea of the right question would be, roughly:

Why does test(0) succeed (pickle format 0 = ascii),
but test(-1) fail (pickle format -1 = pickle.HIGHEST_PROTOCOL)?
I am using python 2.4 on Windows2000

import pickle
class node (object):
     def __init__ (self, *args, **kwds):
         self.args = args
         self.kwds = kwds
         self.reset()

     def reset(self):
         self.name = None
         self.node = 'node'
         self.attributes = {}
         self.children = []
         self.update(*self.args,**self.kwds)

     def update(*args,**kwds):
         for k,v in kwds.items():
             if k in self.__dict__.keys():
                 self.__dict__[k] = v

def test(fmt=-1):
     w = node()
     x = node()
     y = node()
     z = node()
     w.children.append(x)
     x.children.append(y)
     y.children.append(z)
     fn = 'out.bin'
     pickle.Pickler(open(fn,'w'),fmt).dump(w)
     obj = pickle.Unpickler(open(fn,'r')).load()
     return obj

The error message is:
Traceback (most recent call last):
File "<pyshell#24>", line 1, in -toplevel-
    test()
File "<pyshell#22>", line 11, in test
    obj = pickle.Unpickler(open(fn,'r')).load()
File "C:\Python24\lib\pickle.py", line 872, in load
    dispatch[key](self)
File "C:\Python24\lib\pickle.py", line 1189, in load_binput
    i = ord(self.read(1))
TypeError: ord() expected a character, but string of length 0 found


-Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list