[Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)

Kristján Valur Jónsson kristjan at ccpgames.com
Tue Jan 28 06:14:52 CET 2014


Hi there.
I think you should modify your program to marshal (and load) a compiled module.
This is where the optimizations in versions 3 and 4 become important.
K

> -----Original Message-----
> From: Python-Dev [mailto:python-dev-
> bounces+kristjan=ccpgames.com at python.org] On Behalf Of Victor Stinner
> Sent: Monday, January 27, 2014 23:35
> To: Wolfgang
> Cc: Python-Dev
> Subject: Re: [Python-Dev] Python 3.4, marshal dumps slower (version 3
> protocol)
> 
> Hi,
> 
> I'm surprised: marshal.dumps() doesn't raise an error if you pass an invalid
> version. In fact, Python 3.3 only supports versions 0, 1 and 2. If you pass 3, it
> will use the version 2. (Same apply for version
> 99.)
> 
> Python 3.4 has two new versions: 3 and 4. The version 3 "shares common
> object references", the version 4 adds short tuples and short strings
> (produce smaller files).
> 
> It would be nice to document the differences between marshal versions.
> 
> And what do you think of raising an error if the version is unknown in
> marshal.dumps()?
> 
> I modified your benchmark to test also loads() and run the benchmark
> 10 times. Results:
> ---
> Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26) [GCC 4.8.2 20131212
> (Red Hat 4.8.2-7)] on linux
> 
> dumps v0: 391.9 ms
> data size v0: 45582.9 kB
> loads v0: 616.2 ms
> 
> dumps v1: 384.3 ms
> data size v1: 45582.9 kB
> loads v1: 594.0 ms
> 
> dumps v2: 153.1 ms
> data size v2: 41395.4 kB
> loads v2: 549.6 ms
> 
> dumps v3: 152.1 ms
> data size v3: 41395.4 kB
> loads v3: 535.9 ms
> 
> dumps v4: 152.3 ms
> data size v4: 41395.4 kB
> loads v4: 549.7 ms
> ---
> 
> And:
> ---
> Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40) [GCC 4.8.2
> 20131212 (Red Hat 4.8.2-7)] on linux
> 
> dumps v0: 389.4 ms
> data size v0: 45582.9 kB
> loads v0: 564.8 ms
> 
> dumps v1: 390.2 ms
> data size v1: 45582.9 kB
> loads v1: 545.6 ms
> 
> dumps v2: 165.5 ms
> data size v2: 41395.4 kB
> loads v2: 470.9 ms
> 
> dumps v3: 425.6 ms
> data size v3: 41395.4 kB
> loads v3: 528.2 ms
> 
> dumps v4: 369.2 ms
> data size v4: 37000.9 kB
> loads v4: 550.2 ms
> ---
> 
> Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4
> produces the smallest file.
> 
> Victor
> 
> 2014-01-27 Wolfgang <tds333 at gmail.com>:
> > Hi,
> >
> > I tested the latest beta from 3.4 (b3) and noticed there is a new
> > marshal protocol version 3.
> > The documentation is a little silent about the new features, not going
> > into detail.
> >
> > I've run a performance test with the new protocol version and noticed
> > the new version is two times slower in serialization than version 2. I
> > tested it with a simple value tuple in a list (500000 elements).
> > Nothing special. (happens only if the tuple contains also a tuple)
> >
> > Copy of the test code:
> >
> >
> > from time import time
> > from marshal import dumps
> >
> > def genData(amount=500000):
> >   for i in range(amount):
> >     yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i,
> > 1.01*i,
> > True)
> >
> > data = list(genData())
> > print(len(data))
> > t0 = time()
> > result = dumps(data, 2)
> > t1 = time()
> > print("duration p2: %f" % (t1-t0))
> > t0 = time()
> > result = dumps(data, 3)
> > t1 = time()
> > print("duration p3: %f" % (t1-t0))
> >
> >
> >
> > Is the overhead for the recursion detection so high ?
> >
> > Note this happens only if there is a tuple in the tuple of the datalist.
> >
> >
> > Regards,
> >
> > Wolfgang
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > https://mail.python.org/mailman/options/python-
> dev/victor.stinner%40gm
> > ail.com
> >


More information about the Python-Dev mailing list