Python 3.1.2 and marshal
Thomas Jollans
thomas at jollans.com
Sat Jul 17 13:11:33 EDT 2010
On 07/17/2010 06:21 PM, raj wrote:
> Hi,
>
> I am using 64 bit Python on an x86_64 platform (Fedora 13). I have
> some code that uses the python marshal module to serialize some
> objects to files. However, in moving the code to python 3 I have come
> across a situation where, if more than one object has been serialized
> to a file, then while trying to de-serialize only the first object is
> de-serialized. Trying to de-serialize the second object raises an
> EOFError. De-serialization of multiple objects works fine in Python
> 2.x. I tried going through the Python 3 documentation to see if
> marshal functionality has been changed, but haven't found anything to
> that effect. Does anyone else see this problem? Here is some
> example code:
Interesting. I modified your script a bit:
0:pts/2:/tmp% cat marshtest.py
from __future__ import print_function
import marshal
import sys
if sys.version_info[0] == 3:
bytehex = lambda i: '%02X ' % i
else:
bytehex = lambda c: '%02X ' % ord(c)
numlines = 1
numwords = 25
stream = open('fails.mar','wb')
marshal.dump(numlines, stream)
marshal.dump(numwords, stream)
stream.close()
tmpstream = open('fails.mar', 'rb')
for byte in tmpstream.read():
sys.stdout.write(bytehex(byte))
sys.stdout.write('\n')
tmpstream.seek(0)
print('pos:', tmpstream.tell())
value1 = marshal.load(tmpstream)
print('val:', value1)
print('pos:', tmpstream.tell())
value2 = marshal.load(tmpstream)
print('val:', value2)
print('pos:', tmpstream.tell())
print(value1 == numlines)
print(value2 == numwords)
0:pts/2:/tmp% python2.6 marshtest.py
69 01 00 00 00 69 19 00 00 00
pos: 0
val: 1
pos: 5
val: 25
pos: 10
True
True
0:pts/2:/tmp% python3.1 marshtest.py
69 01 00 00 00 69 19 00 00 00
pos: 0
val: 1
pos: 10
Traceback (most recent call last):
File "marshtest.py", line 29, in <module>
value2 = marshal.load(tmpstream)
EOFError: EOF read where object expected
1:pts/2:/tmp%
So, the contents of the file is identical, but Python 3 reads the whole
file, Python 2 reads only the data it uses.
This looks like a simple optimisation: read the whole file at once,
instead of byte-by-byte, to improve performance when reading large
objects. (such as Python modules...)
The question is: was storing multiple objects in sequence an intended
use of the marshal module? I doubt it. You can always wrap your data in
tuples or use pickle.
>
> bash-4.1$ cat marshaltest.py
> import marshal
>
> numlines = 1
> numwords = 25
>
> stream = open('fails.mar','wb')
> marshal.dump(numlines, stream)
> marshal.dump(numwords, stream)
> stream.close()
>
> tmpstream = open('fails.mar', 'rb')
> value1 = marshal.load(tmpstream)
> value2 = marshal.load(tmpstream)
>
> print(value1 == numlines)
> print(value2 == numwords)
>
>
> Here are the results of running this code
>
> bash-4.1$ python2.7 marshaltest.py
> True
> True
>
> bash-4.1$ python3.1 marshaltest.py
> Traceback (most recent call last):
> File "marshaltest.py", line 13, in <module>
> value2 = marshal.load(tmpstream)
> EOFError: EOF read where object expected
>
> Interestingly the file created by using Python 3.1 is readable by both
> Python 2.7 as well as Python 2.6 and both objects are successfully
> read.
>
> Cheers,
> raj
More information about the Python-list
mailing list