Performance: sets vs dicts.
Stefan Behnel
stefan_ml at behnel.de
Wed Sep 1 09:30:21 EDT 2010
Lie Ryan, 01.09.2010 15:46:
> On 09/01/10 00:09, Aahz wrote:
>> However, I think there are some rock-bottom basic guarantees we can make
>> regardless of implementation. Does anyone seriously think that an
>> implementation would be accepted that had anything other than O(1) for
>> index access into tuples and lists? Dicts that were not O(1) for access
>> with non-pathological hashing? That we would accept sets having O()
>> performance worse than dicts?
>>
>> I suggest that we should agree on these guarantees and document them in
>> the core.
>
> While I think documenting them would be great for all programmers that
> care about practical and theoretical execution speed; I think including
> these implementation details in core documentation as a "guarantee"
> would be a bad idea for the reasons Terry outlined.
>
> One way of resolving that is by having two documentations (or two
> separate sections in the documentation) for:
> - Python -- the language -- documenting Python as an abstract language,
> this is the documentation which can be shared across all Python
> implementations. This will also be the specification for Python Language
> which other implementations will be measured to.
> - CPython -- the Python interpreter -- documents implementation details
> and performance metrics. It should be properly noted that these are not
> part of the language per se. This will be the playground for CPython
> experts that need to fine tune their applications to the last drop of
> blood and don't mind their application going nuts with the next release
> of CPython.
I disagree. I think putting the "obvious" guarantees right into the normal
documentation will actually make programmers aware that there *are*
different implementations (and differences between implementations), simply
because it wouldn't just say "O(1)" but "the CPython implementation of this
method has an algorithmic complexity of O(1), other Python implementations
are known to perform alike at the time of this writing". Maybe without the
last half of the sentence if we really don't know how other implementations
work here, or if we expect that there may well be a reason they may choose
to behave different, but in most cases, it shouldn't be hard to make that
complete statement.
After all, we basically know what other implementations there are, and we
also know that they tend to match the algorithmic complexities at least for
the major builtin types. It seems quite clear to me as a developer that the
set of builtin types and "collections" types was chosen in order to cover a
certain set of algorithmic complexities and not just arbitrary interfaces.
Stefan
More information about the Python-list
mailing list