[issue26415] Out of memory, trying to parse a 35MB dict
A. Skrobov
report at bugs.python.org
Tue Mar 8 03:54:52 EST 2016
A. Skrobov added the comment:
OK, I've now looked into it with a fresh build of 3.6 trunk on Linux x64.
Peak memory usage is about 3KB per node:
$ /usr/bin/time -v ./python -c 'import ast; ast.parse("0,"*1000000, mode="eval")'
Command being timed: "./python -c import ast; ast.parse("0,"*1000000, mode="eval")"
...
Maximum resident set size (kbytes): 3015552
...
Out of the 2945 MB total peak memory usage, only 330 MB are attributable to the heap use:
$ valgrind ./python -c 'import ast; ast.parse("0,"*1000000, mode="eval")'
==21232== ...
==21232== HEAP SUMMARY:
==21232== in use at exit: 3,480,447 bytes in 266 blocks
==21232== total heap usage: 1,010,171 allocs, 1,009,905 frees, 348,600,304 bytes allocated
==21232== ...
So, apparently, it's not the nodes themselves taking up a disproportionate amount of memory -- it's the heap getting so badly fragmented that 89% of its memory allocation is wasted.
gprof confirms that there are lots of mallocs/reallocs going on, up to 21 per node:
$ gprof python
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
17.82 0.31 0.31 2000020 0.00 0.00 PyParser_AddToken
13.79 0.55 0.24 2 0.12 0.16 freechildren
12.64 0.77 0.22 21039125 0.00 0.00 _PyMem_RawMalloc
6.32 0.88 0.11 17000101 0.00 0.00 PyNode_AddChild
5.75 0.98 0.10 28379846 0.00 0.00 visit_decref
5.75 1.08 0.10 1000004 0.00 0.00 ast_for_expr
4.60 1.16 0.08 2867 0.00 0.00 collect
4.02 1.23 0.07 20023405 0.00 0.00 _PyObject_Free
2.30 1.27 0.04 3031305 0.00 0.00 _PyType_Lookup
2.30 1.31 0.04 3002234 0.00 0.00 _PyObject_GenericSetAttrWithDict
2.30 1.35 0.04 1 0.04 0.05 ast2obj_expr
1.72 1.38 0.03 28366858 0.00 0.00 visit_reachable
1.72 1.41 0.03 12000510 0.00 0.00 subtype_traverse
1.72 1.44 0.03 3644 0.00 0.00 list_traverse
1.44 1.47 0.03 3002161 0.00 0.00 _PyObjectDict_SetItem
1.15 1.49 0.02 20022785 0.00 0.00 PyObject_Free
1.15 1.51 0.02 15000763 0.00 0.00 _PyObject_Realloc
So, I suppose what needs to be done is to try reducing the number of reallocs involved in handling an AST node; the representation of the nodes themselves doesn't need to change.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26415>
_______________________________________
More information about the Python-bugs-list
mailing list