ESR "Waning of Python" post

dieter dieter at handshake.de
Sat Oct 13 02:01:53 EDT 2018


Marko Rauhamaa <marko at pacujo.net> writes:
> dieter <dieter at handshake.de>:
> ...
>> I work in the domain of web applications. And I made there a nasty
>> experience with garbage collection: occasionally, the web application
>> stopped to respond for about a minute. A (quite difficult) analysis
>> revealed that some (stupid) component created in some situations (a
>> search) hundreds of thousands of temporary objects and thereby
>> triggered a complete garbage collection. The garbage collector started
>> its mark and sweep phase to detect unreachable objects - traversing a
>> graph of millions of objects.
>>
>> As garbage collection becomes drastically more complex if the object
>> graph can change during this phase (and this was Python), a global
>> look prevented any other activity -- leading to the observed
>> latencies.
>
> Yes. The occasional global freeze is unavoidable in any
> garbage-collected runtime environment regardless of the programming
> language.
>
> However, I challenge the notion that creating hundreds of thousands of
> temporary objects is stupid. I suspect that the root cause of the
> lengthy pauses is that the program maintains millions of *nongarbage*
> objects in RAM (a cache, maybe?).

Definitely. The application concerned was a long running web application;
caching was an important feature to speed up its typical use cases.

I do not say that creating hundreds of thousands of temporary objects
is always stupid. But in this case, those temporary objects
were used to wrap early on the document ids found in an index entry just
to get a comfortable interface to access the corresponding documents.
While the index authors were aware that they treat mass data and
therefore stored it in a compact way as C level objects with
efficient "C" level implemented filtering operations on it,
the search author has neglected this aspect and wrapped all document ids
into Python objects.
"search" is essentially a filtering
operation; typically, you need to access far less documents (at
most those in a prefiltered result set) than document ids (the input to
the filtering); in this case, it is stupid to create temporary objects
for all document ids in order to access much less documents later in a
comfortable way.




More information about the Python-list mailing list