Another Python rookie trying to port to C/C++ question.

Fri Sep 26 04:53:10 EDT 2003

Stephen Horne <$$$$$$$$$$$$$$$$$@$$$$$$$$$$$$$$$$$$$$.co.uk> wrote in 
news:79t6nvceo80m6cumskpqdmipa921da054o at 4ax.com:

> On Thu, 25 Sep 2003 15:19:00 -0700, Brian Quinlan <brian at sweetapp.com>
> wrote:
> 
>>> Well, since I'm not worried about "flexible", have my doubts about the
>>> "much slower" part,
>>
>>Your implementation is O(n) with the number of entries while the Python
>>implementation is O(log(n)). If n is small then your implementation
>>might be acceptably fast. Can't you just use the STL hash_map template
>>class?
> 
> Are you sure Python dictionaries are O(log n)? - Remember, they are
> based on hashing.
> 
> Simple hashing is O(1). Handling of collisions and stuff will affect
> that (and I'm no expert on hashing techniques) but I can't help
> wondering if you're confusing Pythons hashing with tree-based
> techniques.
> 

Python dictionaries are effectively O(1). Dictionaries are never more than 
2/3rds full, so on average a dictionary lookup only needs a couple of 
probes. Python's collision resolution is designed such that in most cases 
lookups that collide on the first attempt will diverge for the next 
attempt.

As some of the other posters have said dictionaries in Python have been 
tuned over many years, and since the whole of Python is based on fast 
dictionary lookups it is very unlikely that the OP will get close to the 
same efficiency. That may not matter though as Python dictionaries are 
tuned for general use, so a specific application might be able to do better 
by losing some of the generality.

Some of the tuning extends beyond dictionaries, for example strings are 
commonly used as dictionary keys, and Python avoids recalculating hash 
values more than once for each string. It is unlikely that a C or C++ 
programmer would go to the effort of using a string type that cached hash 
values throughout their application, so for many uses, string lookups in 
dictionaries will almost certainly be slower (although there could be gains 
elsewhere).

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?