Performance: sets vs dicts.

Wed Sep 1 09:30:21 EDT 2010

Lie Ryan, 01.09.2010 15:46:
> On 09/01/10 00:09, Aahz wrote:
>> However, I think there are some rock-bottom basic guarantees we can make
>> regardless of implementation.  Does anyone seriously think that an
>> implementation would be accepted that had anything other than O(1) for
>> index access into tuples and lists?  Dicts that were not O(1) for access
>> with non-pathological hashing?  That we would accept sets having O()
>> performance worse than dicts?
>>
>> I suggest that we should agree on these guarantees and document them in
>> the core.
>
> While I think documenting them would be great for all programmers that
> care about practical and theoretical execution speed; I think including
> these implementation details in core documentation as a "guarantee"
> would be a bad idea for the reasons Terry outlined.
>
> One way of resolving that is by having two documentations (or two
> separate sections in the documentation) for:
> - Python -- the language -- documenting Python as an abstract language,
> this is the documentation which can be shared across all Python
> implementations. This will also be the specification for Python Language
> which other implementations will be measured to.
> - CPython -- the Python interpreter -- documents implementation details
> and performance metrics. It should be properly noted that these are not
> part of the language per se. This will be the playground for CPython
> experts that need to fine tune their applications to the last drop of
> blood and don't mind their application going nuts with the next release
> of CPython.

I disagree. I think putting the "obvious" guarantees right into the normal 
documentation will actually make programmers aware that there *are* 
different implementations (and differences between implementations), simply 
because it wouldn't just say "O(1)" but "the CPython implementation of this 
method has an algorithmic complexity of O(1), other Python implementations 
are known to perform alike at the time of this writing". Maybe without the 
last half of the sentence if we really don't know how other implementations 
work here, or if we expect that there may well be a reason they may choose 
to behave different, but in most cases, it shouldn't be hard to make that 
complete statement.

After all, we basically know what other implementations there are, and we 
also know that they tend to match the algorithmic complexities at least for 
the major builtin types. It seems quite clear to me as a developer that the 
set of builtin types and "collections" types was chosen in order to cover a 
certain set of algorithmic complexities and not just arbitrary interfaces.

Stefan