[Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)

Victor Stinner victor.stinner at gmail.com
Mon Jan 27 16:35:24 CET 2014


Hi,

I'm surprised: marshal.dumps() doesn't raise an error if you pass an
invalid version. In fact, Python 3.3 only supports versions 0, 1 and
2. If you pass 3, it will use the version 2. (Same apply for version
99.)

Python 3.4 has two new versions: 3 and 4. The version 3 "shares common
object references", the version 4 adds short tuples and short strings
(produce smaller files).

It would be nice to document the differences between marshal versions.

And what do you think of raising an error if the version is unknown in
marshal.dumps()?

I modified your benchmark to test also loads() and run the benchmark
10 times. Results:
---
Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26)
[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux

dumps v0: 391.9 ms
data size v0: 45582.9 kB
loads v0: 616.2 ms

dumps v1: 384.3 ms
data size v1: 45582.9 kB
loads v1: 594.0 ms

dumps v2: 153.1 ms
data size v2: 41395.4 kB
loads v2: 549.6 ms

dumps v3: 152.1 ms
data size v3: 41395.4 kB
loads v3: 535.9 ms

dumps v4: 152.3 ms
data size v4: 41395.4 kB
loads v4: 549.7 ms
---

And:
---
Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40)
[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux

dumps v0: 389.4 ms
data size v0: 45582.9 kB
loads v0: 564.8 ms

dumps v1: 390.2 ms
data size v1: 45582.9 kB
loads v1: 545.6 ms

dumps v2: 165.5 ms
data size v2: 41395.4 kB
loads v2: 470.9 ms

dumps v3: 425.6 ms
data size v3: 41395.4 kB
loads v3: 528.2 ms

dumps v4: 369.2 ms
data size v4: 37000.9 kB
loads v4: 550.2 ms
---

Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with
Python 3.4 produces the smallest file.

Victor

2014-01-27 Wolfgang <tds333 at gmail.com>:
> Hi,
>
> I tested the latest beta from 3.4 (b3) and noticed there is a new marshal
> protocol version 3.
> The documentation is a little silent about the new features, not going into
> detail.
>
> I've run a performance test with the new protocol version and noticed the
> new version is two times slower in serialization than version 2. I tested it
> with a simple value tuple in a list (500000 elements).
> Nothing special. (happens only if the tuple contains also a tuple)
>
> Copy of the test code:
>
>
> from time import time
> from marshal import dumps
>
> def genData(amount=500000):
>   for i in range(amount):
>     yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i,
> True)
>
> data = list(genData())
> print(len(data))
> t0 = time()
> result = dumps(data, 2)
> t1 = time()
> print("duration p2: %f" % (t1-t0))
> t0 = time()
> result = dumps(data, 3)
> t1 = time()
> print("duration p3: %f" % (t1-t0))
>
>
>
> Is the overhead for the recursion detection so high ?
>
> Note this happens only if there is a tuple in the tuple of the datalist.
>
>
> Regards,
>
> Wolfgang
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bench.py
Type: text/x-python
Size: 752 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/bdfac9c6/attachment.py>


More information about the Python-Dev mailing list