Comparing lists

Fri Oct 14 16:03:50 EDT 2005

Let me begin by apologizing to Christian as I was too snippy in
my reply, and sounded even snippier than I meant to.

Christian Stapfer wrote:
> "Scott David Daniels" <scott.daniels at acm.org> wrote in message
> news:434ba977 at nntp0.pdx.net...
>>a "better" set implementation will win if
>>it can show better performance without
>>related down-sides.
> 
> Is there ever such a thing? I always
> thought that, most of the time, there
> is no such thing as a free lunch in
If you look at the history of Python's sort, it has steadily gotten
better.  The list implementations has been tweaked to produce better
performance appending and popping.  There are a number of such cases.
In fact, as Python rolls along, code keeps getting improved.  Usually
the requirement is that the change not degrade current benchmarks and
provide a substantial improvement in at least some practical cases.

>>As to the "either now, or eventually;"  if you _must_ have performance
>>now, not in some abstract future, then it behooves you to _test_,
>>_test_, _test_!
> 
> Well, I might want to test: but *first* I want
> to design, preferably in "armchair style" ... [using] 
 > rough complexity measures to ....

>>>>>If the documentation stated the order-of-magnitude
>>>>>behavior of those basic operations up front, then
>>>>>I (and *anyone* else who ever wanted to use those
>>>>>operations on large lists / large sets) could do
>>>>>a quick order-of-magnitude estimation of how
>>>>>a certain program design will behave, performance
>>>>>wise.

I think this is where I started over-reacting.  There are
a number of requests here over time by people who state
that things would be so much better for every one if only
someone (never themselves) did some more work that they
might not otherwise want to do.  The people who implement
the code often do so on their own time.  I find the Python
docs surprisingly good for even commercial documentation.
For work that is done gratis, it is phenomenal.  I hesitate
to ask any more of it (although I have pointed out bugs in
docs as I've found them).

>>And, if the proper documentation is in place, and it
>>says "dictionary lookup is O(N)" (and you avoid using
>>it for exactly that reason), how surprised will you be
>>to discover that the O(N) is only reached if the hash
>>values of the keys are all equal?
> 
> It's not that difficult to distinguish
> *average* case and *worst* case scenarios.
> It might even be a good idea to state,
> if that can easily be done, what the
> *best* case happens do be...
> 
>>Oh, maybe you expect "O(N)" to really mean "\Theta(N)".
>>Then, if you are a dweeb like me, you will respond that
>>"This is not possible, a dictionary of size N must take at
>>least 'O(lg N)' to read the key, never mind processing it."
>>But, it turns out, that given a bound on the size of a
>>process, processing an address is "O(1)", not "O(lg N)".
>>Is that too practical for you, or not enough?
> 
> I do not expect a developer to expend *inordinate*
> amounts of work to figure out the computational
> complexity of what he has implemented. But, as I
> wrote, he *must* have thought about the matter, and
> is thus, by definition, in a rather good position
> to provide what he can frequently simply pull from
> memory when it comes to documenting the interface
> of his module.

I talked about Big-O (worst case) or Big-Theta (average case)
just to point out that no simple characterization like "O(N)"
tells enough of the story to be practically useful.  Once you
decide that isn't good enough, the burden on creating the
documentation is getting substantial, especially given that
you've already spent the effort to write the code and tests
for it.  In fact I'd hesitate to raise the documentation bar
any higher -- I'd hate to think someone thought, "Oh, forget
it, I'll just use this code myself."

>>>>> *Experimenting* is not necessarily as easy to
>>>>>do as you seem to believe.
No, "experimenting" is not always easy (though often it is
easy enough).  However, "experimenting" puts the cost on the
person who derives the benefit, and is thus likely to not be
done in a slipshod way.

> I am not at all a lawyer type, as you seem to
> imagine. I just want to suggest that some
> (to the implementer - but not the average user)
> *easily* available information about computational
> complexity (especially for the most basic data types)
> would be a good thing to have. Nothing more.
My point is that simple performance characterization is not
good enough to be useful, and fully accurate characterization
is an onerous documentation burden.  Something in between
will be fraught with complaints about "surprising" worst
cases.  Whether you would approach middle-ground documentation
with the spirit of "this is enough to go on" or not, rest
assured that a number of people will complain in a "language-
lawyer-like" way about any perceived imperfections.

>>Would you mind if the quality is proportional to
>>the price you paid?
Here I was too snippy.  Sorry.

>>>>You are, of course, either assuming that there's a
>>>>single implementation of Python,
>>>>or that all implementations have the same behaviour.
(Each line denied with "Of course not.")
But realistically, most users will learn the performance
characteristics once, and continue to use the trade-offs
that were characteristics of that version of Python going
forward.  Now maybe you behave differently, but most
programmers I know (and I explicitly include myself) do
have that tendency.  Lots of python code moves around
operating systems and implementations (and the number
of implementations is growing).

>>>But it is reasonable, anyway, to ask for information
>>>about a specific implementation (that one is *forced*
>>>to use).
>>You are not _forced_ to use any implementation of Python.
> What I wanted to say is ... this: *whenever* a Python
> program is run, it will run on some specific
> implementation of Python. That much should be obvious.
If I were still in a snippy mood, I'd point out that you
might look at your original statement and see how it might
be read by someone who accidentally read what you wrote,
rather than what wanted to write.

>>You are free to implement your own Python system.
> Don't be ridiculous.
If Pypy gets going this may not be as unlikely as you think.
You will be able to (relatively easily) replace implementations
of parts of Python and re-generate a system.

> It is you (and only you) here, who is arguing that if
> perfection is not attainable (with reasonable effort),
> nothing should be done at all.
I am trying to say the request you make is substantially larger
than you understand (or would appear at first blush).  Further
it is a request that asks for work from others for your (and
other's) benefit, not work that you offer to pitch in and help on.

> I am as willing to live with imperfect module inferface
> specification as I am willing with imperfect tools more
> generally.  But that will not stop me from making
> *suggestions* for improvement (not arrogant demands, mind you).
Try writing something that would fit your standards for all of
the basic types, punting where you don't know details.  If it is
enough to be useful, others will chip in and help fix it where it
is broken.

> I have seen *many* books on algorithms that specify
> computational complexity measures for the algorithms
> described. Any really good library of algorithms will
> at least try to reach that standard, too.
How many "really good libraries of algorithms" do you know of?
Could you name a couple that have these characterizations?
Practical code is rife with issues of "finite address space,"
low-order effects swamping the higher-order terms for most
practical sizes and such.  It is this collection of nasty little
details that makes characterizing libraries frustratingly hard.

>>>I consider it the job of the implementer to ...
>>Am I to take it you provide this to all of your customers?
> ...I did not want to pose as the customer who expects nothing
 > but perfection from *others* (but not himself, of course).
> I am not at all like that: you are attacking
> a figment of your imagination.
Actually, I was not attacking at all.  I was trying to suggest
that it is a harder task than you imagine.  I am assuming you won't
take me up on the suggestion of starting a flawed document, but I'd
be happy to be proved wrong.  If not, try writing a characterization
of the performance of a dozen or so of your own programs.  Does the
result seem useful in proportion to the work it took you to write it?

>>>How reasonable is it to ask me, or anyone else
>>>for that matter, to extract, experiment-wise
>>>(if it can be done at all with reasonable effort)
>>>information that properly belongs to the implementer
>>>and really should have been exposed in the
>>>documentation in the first place?

>>Not at all reasonable.  How reasonable is it to ask
>>me to provide you support information for free?

OK, here I was _very_ snippy.  Sorry again.

I will assert that often the experiment results in a single bit of
information: "it is fast enough."  Experimenting will get you reliable
answers like that, and only when run-time is an issue will you need to
go into it much further.

--Scott David Daniels
scott.daniels at acm.org