[issue1943] improved allocation of PyUnicode objects

Sun Jan 27 17:03:56 CET 2008

Marc-Andre Lemburg added the comment:

I don't really see the connection with #1629305. 

An optimization that would be worth checking is hooking up the
Py_UNICODE pointer to interned Unicode objects if the contents match
(e.g. do a quick check based on the hash value and then a memcmp). That
would save memory and the call to the pymalloc allocator.

Another strategy could involve a priority queue style cache with the aim
of identifying often used Unicode strings and then reusing them. 

This could also be enhanced using an offline approach: you first run an
application with an instrumented Python interpreter to find the most
often used strings and then pre-fill the cache or interned dictionary on
the production Python interpreter at startup time.

Coming from a completely different angle, you could also use the
Py_UNICODE pointer to share slices of a larger data buffer. A Unicode
sub-type could handle this case, keeping a PyObject* reference to the
larger buffer, so that it doesn't get garbage collected before the
Unicode slice.

Regarding memory constrained environments: these should simply switch
off all free lists and pymalloc. OTOH, even mobile phones come with
gigabytes of RAM nowadays, so it's not really worth the trouble, IMHO.

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1943>
__________________________________