Python 3.1.2 and marshal

Thomas Jollans thomas at jollans.com
Sat Jul 17 13:11:33 EDT 2010


On 07/17/2010 06:21 PM, raj wrote:
> Hi,
> 
> I am using 64 bit Python on an x86_64 platform (Fedora 13).  I have
> some code that uses the python marshal module to serialize some
> objects to files. However, in moving the code to python 3 I have come
> across a situation where, if more than one object has been serialized
> to a file, then while trying to de-serialize only the first object is
> de-serialized. Trying to de-serialize the second object raises an
> EOFError. De-serialization of multiple objects works fine in Python
> 2.x. I tried going through the Python 3 documentation to see if
> marshal functionality has been changed, but haven't found anything to
> that effect.  Does anyone else see this problem?  Here  is some
> example code:

Interesting. I modified your script a bit:

0:pts/2:/tmp% cat marshtest.py
from __future__ import print_function
import marshal
import sys
if sys.version_info[0] == 3:
    bytehex = lambda i: '%02X ' % i
else:
    bytehex = lambda c: '%02X ' % ord(c)

numlines = 1
numwords = 25

stream = open('fails.mar','wb')
marshal.dump(numlines, stream)
marshal.dump(numwords, stream)
stream.close()

tmpstream = open('fails.mar', 'rb')

for byte in tmpstream.read():
    sys.stdout.write(bytehex(byte))

sys.stdout.write('\n')
tmpstream.seek(0)

print('pos:', tmpstream.tell())
value1 = marshal.load(tmpstream)
print('val:', value1)
print('pos:', tmpstream.tell())
value2 = marshal.load(tmpstream)
print('val:', value2)
print('pos:', tmpstream.tell())

print(value1 == numlines)
print(value2 == numwords)
0:pts/2:/tmp% python2.6 marshtest.py
69 01 00 00 00 69 19 00 00 00
pos: 0
val: 1
pos: 5
val: 25
pos: 10
True
True
0:pts/2:/tmp% python3.1 marshtest.py
69 01 00 00 00 69 19 00 00 00
pos: 0
val: 1
pos: 10
Traceback (most recent call last):
  File "marshtest.py", line 29, in <module>
    value2 = marshal.load(tmpstream)
EOFError: EOF read where object expected
1:pts/2:/tmp%

So, the contents of the file is identical, but Python 3 reads the whole
file, Python 2 reads only the data it uses.

This looks like a simple optimisation: read the whole file at once,
instead of byte-by-byte, to improve performance when reading large
objects. (such as Python modules...)

The question is: was storing multiple objects in sequence an intended
use of the marshal module? I doubt it. You can always wrap your data in
tuples or use pickle.

> 
> bash-4.1$ cat marshaltest.py
> import marshal
> 
> numlines = 1
> numwords = 25
> 
> stream = open('fails.mar','wb')
> marshal.dump(numlines, stream)
> marshal.dump(numwords, stream)
> stream.close()
> 
> tmpstream = open('fails.mar', 'rb')
> value1 = marshal.load(tmpstream)
> value2 = marshal.load(tmpstream)
> 
> print(value1 == numlines)
> print(value2 == numwords)
> 
> 
> Here are the results of running this code
> 
> bash-4.1$ python2.7 marshaltest.py
> True
> True
> 
> bash-4.1$ python3.1 marshaltest.py
> Traceback (most recent call last):
>   File "marshaltest.py", line 13, in <module>
>     value2 = marshal.load(tmpstream)
> EOFError: EOF read where object expected
> 
> Interestingly the file created by using Python 3.1 is readable by both
> Python 2.7 as well as Python 2.6 and both objects are successfully
> read.
> 
> Cheers,
> raj




More information about the Python-list mailing list