os.listdir

Duncan Booth duncan at NOSPAMrcp.co.uk
Thu Sep 11 11:21:18 EDT 2003


"Michael Peuser" <mpeuser at web.de> wrote in news:bjq102$m21$03$1 at news.t-
online.com:

> It is generally very difficult to
> compute the mean access time to a hash. It totally depends on the size of
> the internal hash table allocated as to the number of elements you store
> there. In Perl there is trick to manipulate that (i.e. to increase it
> appropriately when you expect e.g. some 100.000 entries). I don't know
> whether this is possible in (C)Python, according to which algorithm the
> initial size is computed, and when or if the table is reconfigured (which
> would be a major undertaking btw). Any info someone?

In C Python, all dictionaries have an initial size of 8 elements. There is 
no way to override this (except recompiling!)

Maximum dictionary load is 2/3, i.e. when an insertion would result in more 
than 2/3 of the cells being filled[*], the dictionary is resized. Usually 
the resize involves doubling the size, but if a lot of dictionary slots 
have been deleted it can instead reduce the size (or leave it unchanged).
This may be surprising: deleting elements from a Python dictionary never 
decreases the memory allocated to the dictionary, but inserting elements 
can.

The first probe into a dictionary uses only the least significant bits of 
the hash value, but if there is a collision the other hash bits are 
gradually incorporated, so eventually all 32 bits of the hash value are 
used. This means that most collisions diverge quickly.

[*] Terminology: Cells may be free, filled, or used. Initially all cells 
are free, inserting an element changes a cell from free or filled to used, 
deleting changes a cell from used to filled. Lookup probes cells until a 
free cell is seen.

> 
> Apart from that, you will generally find the value to your key with the
> first access if the table is decently dimensioned.


-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?




More information about the Python-list mailing list