Performance: sets vs dicts.

Tue Aug 31 19:24:19 EDT 2010

On 8/31/2010 10:09 AM, Aahz wrote:
> In article<mailman.239.1283257496.29448.python-list at python.org>,
> Jerry Hill<malaclypse2 at gmail.com>  wrote:
>> On Mon, Aug 30, 2010 at 7:42 PM, Aahz<aahz at pythoncraft.com>  wrote:
>>>
>>> Possibly; IMO, people should not need to run timeit to determine basic
>>> algorithmic speed for standard Python datatypes.
>>
>> http://wiki.python.org/moin/TimeComplexity takes a stab at it.  IIRC,
>> last time this came up, there was some resistance to making promises
>> about time complexity in the official docs, since that would make
>> those numbers part of the language, and thus binding on other
>> implementations.
>
> I'm thoroughly aware of that page and updated it yesterday to make it
> easier to find.  ;-)
>
> However, I think there are some rock-bottom basic guarantees we can make
> regardless of implementation.  Does anyone seriously think that an
> implementation would be accepted that had anything other than O(1) for
> index access into tuples and lists?

Does anyone seriously think that an implementation should be rejected as 
an implementation if it intellegently did seq[n] lookups in log2(n)/31 
time units for all n (as humans would do), instead of stupidly taking 1 
time unit for all n < 2**31 and rejecting all larger values (as 32-bit 
CPython does)?

 >  Dicts that were not O(1) for access with non-pathological hashing?

You would actually be unhappy if small dicts ran even faster than they 
do now (while large dicts ran the same)? Not me!

> I suggest that we should agree on these guarantees and document them in
> the core.

I disagree for several reasons.

1. It would a category mistake. Python is an abstract algorithm 
language. The concept of speed is not applicable. I happen to think it 
one of the best, which is why I once dubbed it 'executable pseudocode', 
to compare it to similar but inferior, non-executable algorithm 
languages too often used still today. To human readers, as readers, 
speed of CPython or anything else is not relevant.

2. A Python interpreter is an abstract algorithm. Algorithms can be 
characterized by operation counts, but not by physical universe time.

3. Algorithms, including Python interpreters, can be embodied in 
anything that can compute, including humans, VonNeuman machines of 
various types, and possible future non-VonNeuman machines. To limit 
'python interpreter' to current vonNeuman machines would be short 
sighted. Human interpretation of Python code (and other specifications) 
is part of development, testing, and debugging.

4. Guarantee? Who will pay what to who if what is not performed ;-?. The 
Python license intentionally disclaims any guarantee of performance. If 
you are using 'guarantee' metaphorically, perhaps try another word.

5. Accepted? If you want to claim that an interpreter that is useless 
for your purposes is not really an interpreter, you are free to, I 
suppose. Asking the PSF to impose your view on me would, to me, violate 
its diversity policy.

6. O-notation was developed for characterizing the asymptotic (large-N) 
behavior of abstract algorithms in terms of abstract operation counts. 
Left out are differences between operations, lower order terms, 
multiplicative constants, how large N must be to get large-N behavior, 
and what happens for smaller N. These and other factors make the 
translation of O(f(n)) into real time extremely mushy or even wrong.

I already suggested that O(1) lookup is a symption of simple-minded 
arithmetic stupidity, of doing unnecessary work when adding small 
numbers. When it is a result doing most additions slower than necessary, 
it is a bad thing, not a good thing, and a bad criteria for an 
acceptable algorithm and acceptable interpreter. Similarly, books say 
that O(n*logn) sorting is  'better' that O(n**2) sorting. However, Tim 
Peters verified that the opposite is true for real times for small n, 
and hence the current list.sort smartly uses a fast O(n**2) algorithm 
for small lists (len < 64) and small runs in larger lists. Rejecting 
list.sort for this reason would be stupid.

-- 
Terry Jan Reedy