Comparing lists

Tue Oct 11 13:08:23 EDT 2005

"Scott David Daniels" <scott.daniels at acm.org> wrote in message
news:434ba977 at nntp0.pdx.net...
> Christian Stapfer wrote:
>> "Steve Holden" <steve at holdenweb.com> wrote in message
>> news:mailman.1843.1128959845.509.python-list at python.org...
>>
>>>Christian Stapfer wrote:
>>>
>>>>"George Sakkis" <gsakkis at rutgers.edu> wrote in message
>>>>news:1128952417.0544cc73190bbbd4e683448c137969c8 at teranews...
>>>>
>>>>
>>>>>"Christian Stapfer" <nil at dev.nul> wrote:
>>>>>
>>>>>><ajikoe at gmail.com> wrote:
>>>>>>>try to use set.
> A reasonable suggestion.  A set must have the trade-offs involved.
> The abstraction itself brings to mind the issues, and the performance
> can, at least in theory, be handled there.  If that is true (that
> the "set" abstraction "sees" the problem), then you can rely on the
> Python implementation of "set" to either now, or eventually, have
> a "good" implementation -- one not too far off efficient.

It is, unfortunately, not infrequently the case
that there are different implementations for
a given data type that are efficient in different
ways - but that there is no one single implementation
that is the most efficient in *all* regards.
Thus the implementer must make a trade-off,
and that trade-off has consequences, performance
wise, that he might want to tell the users of his
module about...

>  The Python gang is good at this stuff;

I'm not denying it. The question is about
telling users upfront what the consequences
are (roughly) of the trade-offs that have
been made so that they can use the modules
that have been provided more effectively.

> a "better" set implementation will win if
> it can show better performance without
> related down-sides.

Is there ever such a thing? I always
thought that, most of the time, there
is no such thing as a free lunch in
this field.

> As to the "either now, or eventually;"  if you _must_ have performance
> now, not in some abstract future, then it behooves you to _test_,
> _test_, _test_!

Well, I might want to test: but *first* I want
to design, preferably in "armchair style"
- at least as regards the basic approach
that I want to take. This "armchair stage"
involves not infrequently the use of rough
complexity measures to exclude the clearly
unusable and chose, instead, an approach
that is very likely efficient enough for
what I need.

>>>> If the documentation stated the order-of-magnitude
>>>>behavior of those basic operations up front, then
>>>>I (and *anyone* else who ever wanted to use those
>>>>operations on large lists / large sets) could do
>>>>a quick order-of-magnitude estimation of how
>>>>a certain program design will behave, performance
>>>>wise.
> And, if the proper documentation is in place, and it
> says "dictionary lookup is O(N)" (and you avoid using
> it for exactly that reason), how surprised will you be
> to discover that the O(N) is only reached if the hash
> values of the keys are all equal?

It's not that difficult to distinguish
*average* case and *worst* case scenarios.
It might even be a good idea to state,
if that can easily be done, what the
*best* case happens do be...

> Oh, maybe you expect "O(N)" to really mean "\Theta(N)".
> Then, if you are a dweeb like me, you will respond that
> "This is not possible, a dictionary of size N must take at
> least 'O(lg N)' to read the key, never mind processing it."
> But, it turns out, that given a bound on the size of a
> process, processing an address is "O(1)", not "O(lg N)".
> Is that too practical for you, or not enough?

I do not expect a developer to expend *inordinate*
amounts of work to figure out the computational
complexity of what he has implemented. But, as I
wrote, he *must* have thought about the matter, and
is thus, by definition, in a rather good position
to provide what he can frequently simply pull from
memory when it comes to documenting the interface
of his module.

>>>>  *Experimenting* is not necessarily as easy to
>>>>do as you seem to believe.
>
>
>>>>How do you, for example, hit upon the worst-case behavior with your test
>>>>data?
> Are you saying the documentation should characterize the
> cases that achieve worst-case behavior?  This is a stiff
> burden indeed, not one I am used to in even the most rigorous
> classes I've seen.

If it happens to be particularly difficult, for a
given implementation (of whatever abstract data type),
then, of course, state this upfront and leave it at
that.

>  If there is such a characterization,
> how burned will you feel if a case is overlooked and that
> case is the one that you sold to your customer?

I am not at all a lawyer type, as you seem to
imagine. I just want to suggest that some
(to the implementer - but not the average user)
*easily* available information about computational
complexity (especially for the most basic data types)
would be a good thing to have. Nothing more.

>  Are you
> willing to provide the same guaranteed performance and
> documentation of performance to your customer that you
> you expect of the Python system?

I'm using python mainly as a very handy language to
write whatever (usually relatively small) utility
programs I happen to require for my main job. I am
not (currently) developing any "serious" programs
that are targeted for sale. (Thought the proper
functioning and adequate efficiency of my personal
utility programs really does matter, but it mainly
matters for *me* - and for my "customers", if I
can be said to have any, which seems doubtful,
it only matters indirectly at best.)
  Even so, I frequently need to have a reasonable
idea of what computational complexity attaches
to certain operations on certain data types. But
I hardly ever have the time to go into an extended
period of first comparing several different approaches
by way of experiment.

> Would you mind if the quality is proportional to
> the price you paid?

There is really not need to indulge in polemics like
this. I am just making a suggestion as to what information
might *help* improve the usability of Python modules.
If making such suggestions regularly only led to negative
exchanges of this sort, there would be very little room
for learning in this community indeed...

>>>You are, of course, either assuming that there's a
>>>single implementation of Python,
>> Of course not!
>>
>>>or that all implementations have the same behaviour.
>> Of course not!
>
>> But it is reasonable, anyway, to ask for information
>> about a specific implementation (that one is *forced*
>> to use).
> You are not _forced_ to use any implementation of Python.

What I wanted to say is *not*, that it appears to be
a terrible burden to my inflated consumer-ego to
have to use something not *absolutely* perfect in
*all* respects, such as Python. What I wanted to
say, here, is just this: *whenever* a Python program
is run, it will run on some specific implementation
of Python. That much should be obvious.

> You are free to implement your own Python system.

Don't be ridiculous.

>>   And if the implementer wants to complain that
>> giving such information would break his wonderful
>> abstraction then I can only answer: It is *reality*
>> that *will* break your abstraction as regards
>> performance! It is, therefore, absolutely *no*
>> use to *pretend* you *can* avoid this form of "breaking
>> the abstraction" by simply avoiding to mention it
>> in the documentation...
> I simply repeat: I have never seen a paper characterizing
> the performance of an algorithm as "O(expr(N))" that described
> in details _all_ cases that reached that limit.

That's what's usually called "throwing out the
baby with the bathwater". It is you (and only
you) here, who is arguing that if perfection is
not attainable (with reasonable effort), nothing
should be done at all. I as am willing to live
with imperfect module inferface specification as
I am willing with imperfect tools more generally.
But that will not stop me from making *suggestions*
for improvement (not arrogant demands, mind you).

>  At most such
> papers describe _one_ such cases.  Nor have I ever seen papers
> describing the performance of an algorithm as "\Theta(expr(N))"
> that characterized the cases that broke the "\Theta" performance.

To repeat: I was not asking for the impossible, not
even asking for the difficult, but simply asking for the
dumping of the (to the implementer, but not the average
user) obvious and reasonably certain information
about computational complexity of what operations
are exposed in the interface of a module.
I have seen *many* books on algorithms that specify
computational complexity measures for the algorithms
described. Any really good library of algorithms will
at least try to reach that standard, too.

> If you provide me the papers, provide me a C compiler with
> equivalent docs on all C expressions, and provide me the funding
> to update the Python docs, I will be happy to do so for a single
> version of Python and a since version of CPython.

I am not asking you, nor anyone else, to do
any amount of boring or otherwise hard work
here.

>  I expect I
> will have an easier time of it than the IronPython people will
> have.
>> I consider it the job of the implementer to know
>> about the trade-offs that he has been making in
>> choosing one particular implementation, and to
>> know what computational complexity therefore
>> attaches to the various operations exposed in
>> its interface.
> Am I to take it you provide this to all of your customers?

I have no customers (lucky me). And I did not
want to pose as the all too frequently encountered
species of the arrogant (and equally ignorant)
customer who expects nothing but perfection
from *others* (but not himself, of course).
I am not at all like that: you are attacking
a figment of your imagination.

>> How reasonable is it to ask me, or anyone else
>> for that matter, to extract, experiment-wise
>> (if it can be done at all with reasonable effort)
>> information that properly belongs to the implementer
>> and really should have been exposed in the
>> documentation in the first place?
> Not at all reasonable.  How reasonable is it to ask
> me to provide you support information for free?

Even providing the most basic information about
the interface of a module, even providing the
mere names of member functions, could be denied
on *that* basis. Thus, this argument fails.
It fails simply because it "proves" too much...

Regards,
Christian
-- 
»In the long run men hit only what they aim at.
 Therefore, though they should fail immediately,
 they had better aim at something high.«
  - Henry David Thoreau: 'Walden'