[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

M.-A. Lemburg mal at egenix.com
Tue Dec 23 13:47:15 CET 2008


On 2008-12-22 22:45, Steven D'Aprano wrote:
> On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:
>> On 2008-12-20 23:16, Martin v. Löwis wrote:
>>>>> I will try next week to see if I can come up with a smaller,
>>>>> submittable example.  Thanks.
>>>> These long exit times are usually caused by the garbage collection
>>>> of objects. This can be a very time consuming task.
>>> I doubt that. The long exit times are usually caused by a bad
>>> malloc implementation.
>> With "garbage collection" I meant the process of Py_DECREF'ing the
>> objects in large containers or deeply nested structures, not the GC
>> mechanism for breaking circular references in Python.
>>
>> This will usually also involve free() calls, so the malloc
>> implementation affects this as well. However, I've seen such long
>> exit times on Linux and Windows, which both have rather good
>> malloc implementations.
>>
>> I don't think there's anything much we can do about it at the
>> interpreter level. Deleting millions of objects takes time and that's
>> not really surprising at all. It takes even longer if you have
>> instances with .__del__() methods written in Python.
> 
> 
> This behaviour appears to be specific to deleting dicts, not deleting 
> random objects. I haven't yet confirmed that the problem still exists 
> in trunk (I hope to have time tonight or tomorrow), but in my previous 
> tests deleting millions of items stored in a list of tuples completed 
> in a minute or two, while deleting the same items stored as key:item 
> pairs in a dict took 30+ minutes. I say plus because I never had the 
> patience to let it run to completion, it could have been hours for all 
> I know.

That's interesting. The dictionary dealloc routine doesn't give
any hint as to why this should take longer than deallocating
a list of tuples.

However, due to the way dictionary tables are allocated, it is
possible that you create a table that is nearly twice the size
of the actual number of items needed by the dictionary. At those
dictionary size, this can result in a lot of extra memory being
allocated, certainly more than the corresponding list of tuples
would use.

>> Applications can choose other mechanisms for speeding up the
>> exit process in various (less clean) ways, if they have a need for
>> this.
>>
>> BTW: Rather than using a huge in-memory dict, I'd suggest to either
>> use an on-disk dictionary such as the ones found in mxBeeBase or
>> a database.
> 
> The original poster's application uses 45GB of data. In my earlier 
> tests, I've experienced the problem with ~ 300 *megabytes* of data: 
> hardly what I would call "huge".

Times have changed, that's true :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 23 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list