[docs] [issue12775] immense performance problems related to the garbage collector

Daniel Svensson report at bugs.python.org
Thu Aug 18 18:23:23 CEST 2011


Daniel Svensson <dsvensson at gmail.com> added the comment:

using: (except in python2.5 case where simplejson is used, which ought to be the same thing right?)
import time, gc, json, sys

def read_json_blob():
	t0 = time.time()
	fd = open("datatest1.json")
	data = fd.read()
	fd.close()
	t1 = time.time()
	parsed = json.loads(data)
	t2 = time.time()
	print("read file in %.2fs, parsed json in %.2fs, total of %.2fs" % (t1-t0, t2-t1, t2-t0))

if len(sys.argv) > 1 and sys.argv[1] == "nogc":
	gc.disable()

read_json_blob()
print(gc.collect())

daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python3.2 gc.py nogc
read file in 1.34s, parsed json in 2.74s, total of 4.07s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python3.2 gc.py
read file in 1.33s, parsed json in 2.71s, total of 4.05s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.6 gc.py
read file in 0.89s, parsed json in 56.03s, total of 56.92s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.6 gc.py nogc
read file in 0.89s, parsed json in 56.38s, total of 57.27s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.7 gc.py
read file in 0.89s, parsed json in 3.87s, total of 4.75s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.7 gc.py nogc
read file in 0.89s, parsed json in 3.91s, total of 4.80s
0
daniel at aether:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.5 gc.py
read file in 0.11s, parsed json in 53.00s, total of 53.11s
0
daniel at aether:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.5 gc.py nogc
read file in 0.14s, parsed json in 53.13s, total of 53.28s
0

Everything is equally slow.. no weird things there, except that Python 3.2 seems to take more time to load the file. Nice performance improvement of the json module in 3.2 compared to older Python versions.


Next up. Trying with cjson which decodes via a binary module:

import time, gc, cjson, sys

def read_json_blob():
	t0 = time.time()
	fd = open("datatest1.json")
	data = fd.read()
	fd.close()
	t1 = time.time()
	parsed = cjson.decode(data)
	t2 = time.time()
	print("read file in %.2fs, parsed json in %.2fs, total of %.2fs" % (t1-t0, t2-t1, t2-t0))

if len(sys.argv) > 1 and sys.argv[1] == "nogc":
	gc.disable()

read_json_blob()
print(gc.collect())

daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.6 gc.py
read file in 0.89s, parsed json in 2.58s, total of 3.46s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.6 gc.py nogc
read file in 0.89s, parsed json in 1.44s, total of 2.33s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.7 gc.py nogc
read file in 0.89s, parsed json in 1.53s, total of 2.42s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.7 gc.py
read file in 0.89s, parsed json in 1.54s, total of 2.43s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.6 gc.py nogc
read file in 0.89s, parsed json in 1.44s, total of 2.33s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.6 gc.py
read file in 0.89s, parsed json in 2.58s, total of 3.47s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.6 gc.py
read file in 0.89s, parsed json in 2.58s, total of 3.47s
0
daniel at neutronstar:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.6 gc.py nogc
read file in 0.89s, parsed json in 1.43s, total of 2.32s
0
daniel at aether:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.5 gc.py
read file in 0.14s, parsed json in 1.58s, total of 1.73s
0
daniel at aether:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.5 gc.py nogc
read file in 0.16s, parsed json in 1.07s, total of 1.23s
0
daniel at aether:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.5 gc.py
read file in 0.14s, parsed json in 1.58s, total of 1.72s
0
daniel at aether:~$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"; python2.5 gc.py nogc
read file in 0.14s, parsed json in 1.06s, total of 1.20s

The file is actually a bit too small for good measurement when using cjson, but interesting point here is obviously the huge difference between GC and no GC in Python 2.5, and quite a bit win in 2.6 too, which becomes a lot more apparent with larger files.

Another interesting thing is that Python 2.6 is consistently faster than 2.7 when the GC is disabled in 2.6, compared to both enabled and disabled in 2.7. The cjson isn't compatible with Python 3.2 so I cannot verify how things work there.

So overall it looks like it's less of a problem in newer versions of Python. We are phasing out the software that is deployed on Debian Lenny so it's a problem that will go away. I don't think I have any objections with closing this ticket again.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12775>
_______________________________________


More information about the docs mailing list