The Art of Pickling: Binary vs Ascii difficulties
Scott David Daniels
Scott.Daniels at Acm.Org
Thu Oct 14 17:23:30 EDT 2004
Bix wrote:
> As this is my very first post, I'd like to give thanks to all who
> support this with their help. Hopefully, this question hasn't been
> answered (too many times) before...
> If anyone could explain this behavior, I'd greatly appreciate it.
You clearly spent some effort on this, but you could have boiled this
down to a smaller, more direct question.
The short answer is, "when reading and/or writing binary data,
the files must be opened in binary." Pickles in "ascii" are not
in a binary format, but the others are.
The longer answer includes:
You should handle files a bit more carefully. Don't presume they get
automatically get closed.
I'd change:
> fn = 'out.bin'
> pickle.Pickler(open(fn,'w'),fmt).dump(w)
> obj = pickle.Unpickler(open(fn,'r')).load()
to:
fn = 'out.bin'
dest = open(fn, 'w')
try:
pickle.Pickler(dest, fmt).dump(w)
finally:
dest.close()
source = open(fn, 'r')
try:
return pickle.Unpickler(source).load()
finally:
source.close()
Then the problem (the mode in which you open the file) shows up to a
practiced eye.
dest = open(fn, 'w') ... source = open(fn, 'r')
should either be:
dest = open(fn, 'wb') ... source = open(fn, 'rb')
which works "OK" for ascii, but is not in machine-native text format.
or:
if fmt:
readmode, writemode = 'rb', 'wb'
else:
readmode, writemode = 'r', 'b'
...
dest = open(fn, writemode) ... source = open(fn, readmode)
By the way, the reason that binary mode sometimes works (which is,
I suspect, what is troubling you), is that not all bytes are necessarily
written out as-is in text mode. On Windows and MS-DOS systems,
a byte with value 10 is written as a pair of bytes, 13 followed by 10.
On Apple systems, another translation happens. On unix (and hence
linux) there is no distinction between data written as text and the
C representation of '\n' for line breaks. This means nobody on linux
who ran your example saw a problem, I suspect.
This C convention is a violation of the ASCII code as it was then
defined, in order to save a byte per line (treating '\n' as end-of-line,
not line-feed). An ASCII-conforming printer when fed 'a\nb\nc\r\n.\r\n'
should print:
a
b
c
.
My idea of the right question would be, roughly:
Why does test(0) succeed (pickle format 0 = ascii),
but test(-1) fail (pickle format -1 = pickle.HIGHEST_PROTOCOL)?
I am using python 2.4 on Windows2000
import pickle
class node (object):
def __init__ (self, *args, **kwds):
self.args = args
self.kwds = kwds
self.reset()
def reset(self):
self.name = None
self.node = 'node'
self.attributes = {}
self.children = []
self.update(*self.args,**self.kwds)
def update(*args,**kwds):
for k,v in kwds.items():
if k in self.__dict__.keys():
self.__dict__[k] = v
def test(fmt=-1):
w = node()
x = node()
y = node()
z = node()
w.children.append(x)
x.children.append(y)
y.children.append(z)
fn = 'out.bin'
pickle.Pickler(open(fn,'w'),fmt).dump(w)
obj = pickle.Unpickler(open(fn,'r')).load()
return obj
The error message is:
Traceback (most recent call last):
File "<pyshell#24>", line 1, in -toplevel-
test()
File "<pyshell#22>", line 11, in test
obj = pickle.Unpickler(open(fn,'r')).load()
File "C:\Python24\lib\pickle.py", line 872, in load
dispatch[key](self)
File "C:\Python24\lib\pickle.py", line 1189, in load_binput
i = ord(self.read(1))
TypeError: ord() expected a character, but string of length 0 found
-Scott David Daniels
Scott.Daniels at Acm.Org
More information about the Python-list
mailing list