Comparing lists

Tue Oct 11 08:02:30 EDT 2005

Christian Stapfer wrote:
> "Steve Holden" <steve at holdenweb.com> wrote in message 
> news:mailman.1843.1128959845.509.python-list at python.org...
> 
>>Christian Stapfer wrote:
>>
>>>"George Sakkis" <gsakkis at rutgers.edu> wrote in message 
>>>news:1128952417.0544cc73190bbbd4e683448c137969c8 at teranews...
>>>
>>>
>>>>"Christian Stapfer" <nil at dev.nul> wrote:
>>>>
>>>>><ajikoe at gmail.com> wrote:
>>>>>>try to use set.
A reasonable suggestion.  A set must have the trade-offs involved.
The abstraction itself brings to mind the issues, and the performance
can, at least in theory, be handled there.  If that is true (that
the "set" abstraction "sees" the problem), then you can rely on the
Python implementation of "set" to either now, or eventually, have
a "good" implementation -- one not too far off efficient.  The Python
gang is good at this stuff; a "better" set implementation will win if
it can show better performance without related down-sides.

As to the "either now, or eventually;"  if you _must_ have performance
now, not in some abstract future, then it behooves you to _test_,
_test_, _test_!

>>> If the documentation stated the order-of-magnitude
>>>behavior of those basic operations up front, then
>>>I (and *anyone* else who ever wanted to use those
>>>operations on large lists / large sets) could do
>>>a quick order-of-magnitude estimation of how
>>>a certain program design will behave, performance
>>>wise.
And, if the proper documentation is in place, and it
says "dictionary lookup is O(N)" (and you avoid using
it for exactly that reason), how surprised will you be
to discover that the O(N) is only reached if the hash
values of the keys are all equal?

Oh, maybe you expect "O(N)" to really mean "\Theta(N)".
Then, if you are a dweeb like me, you will respond that
"This is not possible, a dictionary of size N must take at
least 'O(lg N)' to read the key, never mind processing it."
But, it turns out, that given a bound on the size of a
process, processing an address is "O(1)", not "O(lg N)".
Is that too practical for you, or not enough?

>>>  *Experimenting* is not necessarily as easy to
>>>do as you seem to believe. 

>>>How do you, for example, hit upon the worst-case 
 >>>behavior with your test data?
Are you saying the documentation should characterize the
cases that achieve worst-case behavior?  This is a stiff
burden indeed, not one I am used to in even the most rigorous
classes I've seen.  If there is such a characterization,
how burned will you feel if a case is overlooked and that
case is the one that you sold to your customer?  Are you
willing to provide the same guaranteed performance and
documentation of performance to your customer that you
you expect of the Python system?  Would you mind if the
quality is proportional to the price you paid?

>>You are, of course, either assuming that there's a
>>single implementation of Python,
> Of course not!
> 
>>or that all implementations have the same behaviour.
> Of course not!

> But it is reasonable, anyway, to ask for information
> about a specific implementation (that one is *forced*
> to use). 
You are not _forced_ to use any implementation of Python.
You are free to implement your own Python system.

>   And if the implementer wants to complain that
> giving such information would break his wonderful
> abstraction then I can only answer: It is *reality*
> that *will* break your abstraction as regards
> performance! It is, therefore, absolutely *no*
> use to *pretend* you *can* avoid this form of "breaking
> the abstraction" by simply avoiding to mention it
> in the documentation...
I simply repeat: I have never seen a paper characterizing
the performance of an algorithm as "O(expr(N))" that described
in details _all_ cases that reached that limit.  At most such
papers describe _one_ such cases.  Nor have I ever seen papers
describing the performance of an algorithm as "\Theta(expr(N))"
that characterized the cases that broke the "\Theta" performance.
If you provide me the papers, provide me a C compiler with
equivalent docs on all C expressions, and provide me the funding
to update the Python docs, I will be happy to do so for a single
version of Python and a since version of CPython.  I expect I
will have an easier time of it than the IronPython people will
have.

> I consider it the job of the implementer to know
> about the trade-offs that he has been making in
> choosing one particular implementation, and to
> know what computational complexity therefore
> attaches to the various operations exposed in
> its interface.
Am I to take it you provide this to all of your customers?

> How reasonable is it to ask me, or anyone else
> for that matter, to extract, experiment-wise
> (if it can be done at all with reasonable effort)
> information that properly belongs to the implementer
> and really should have been exposed in the
> documentation in the first place?
Not at all reasonable.  How reasonable is it to ask
me to provide you support information for free?

--Scott David Daniels
scott.daniels at acm.org