[Python-Dev] Reduce memory footprint of Python

martin at v.loewis.de martin at v.loewis.de
Sun Oct 6 22:45:09 CEST 2013


Quoting Victor Stinner <victor.stinner at gmail.com>:

> Slowly, I'm trying to see if it would be possible to reduce the memory
> footprint of Python using the tracemalloc module.
[...]
> Should I open a separated issue for each idea to track them in the bug
> tracker, or a global issue?

There is a third alternative which I would recommend: not open tracker
issues at all - unless you can also offer a patch. The things you find
are not bugs per se, not even "issues". It is fine and applaudable that
you look into this, but other people may have other priorities (like
reimplementing the hash function of string objects).

So if you remember that there is a potential for optimization, that
may be enough for the moment. Or share it on python-dev (as you do
below); people may be intrigued to look into this further, or ignore
it. It's easy to ignore a posting to python-dev, but more difficult to
ignore an issue on the tracker (*something* should be done about it,
e.g. close with no action).

> First, I noticed that linecache can allocate more than 2 MB. What do
> you think of adding a registry of "clear cache" functions? For
> exemple, re.purge() and linecache.clearcache(). gc.collect() clears
> free lists. I don't know if gc.collect() should be related to this new
> registy (clear all caches) or not.

I'm -1 on this idea. There are some "canonical" events that could trigger
clearance of caches, namely
- out-of-memory situations
- OS signals indicating memory pressure
While these sound interesting in theory, they fail in practice. For
example, they are very difficult to test.

> The dictionary of interned Unicode strings can be large: up to 1.5 MB
> (with +30,000 strings). Just the dictionary, excluding size of
> strings. Is the size normal or not? Using tracemalloc, this dictionary
> is usually to largest memory block.

I'd check the contents of the dictionary. How many strings are in there;
how many of these are identifiers; how many have more than one outside
reference; how many are immortal?

If there is a lot of strings that are not identifiers, some code possibly
abuses interning, and should use its own dictionary instead. For the
refcount-1 mortal identifiers, I'd trace back where they are stored,
and check if many of them originate from the same module.

Regards,
Martin




More information about the Python-Dev mailing list