Comparing lists

Steve Holden steve at holdenweb.com
Mon Oct 10 11:52:23 EDT 2005


Christian Stapfer wrote:
> "George Sakkis" <gsakkis at rutgers.edu> wrote in message 
> news:1128952417.0544cc73190bbbd4e683448c137969c8 at teranews...
> 
>>"Christian Stapfer" <nil at dev.nul> wrote:
>>
>>
>>><ajikoe at gmail.com> wrote:
>>>
>>>>try to use set.
>>>
>>>    Sorting the two lists and then extracting
>>>A-B, B-A, A|B, A & B and A ^ B in one single
>>>pass seems to me very likely to be much faster
>>>for large lists.
>>
>>Why don't you implement it, test it and time it
>>to be more convincing about your intuition ?
> 
> 
> The problem is in the generation of the test data.
> Even merely generating a set of (suitably "average",
> "random", and suitably "worst case") datasets might
> turn out to be a major undertaking.
>  If the documentation stated the order-of-magnitude
> behavior of those basic operations up front, then
> I (and *anyone* else who ever wanted to use those
> operations on large lists / large sets) could do
> a quick order-of-magnitude estimation of how
> a certain program design will behave, performance
> wise.
>   *Experimenting* is not necessarily as easy to
> do as you seem to believe. How do you, for example,
> hit upon the worst-case behavior with your test
> data? - Without knowing *anything* about the
> implementation it might a matter sheer luck.
> If you *know* something about the implementation
> then, of course, you might be able to figure it
> out. (But note that if you know *that* much about
> the implementation, you usually have an order-of-
> magnitude estimate anyway and don't need to do
> *any* experimenting in order to answer my question.)
> 
You are, of course, either assuming that there's a single implementation 
of Python, or that all implementations have the same behaviour. Or 
alternatively you are asking all implementers to do what you seem to 
consider so difficult (i.e. identify worst-case scenarios and then 
estimate order-of-magnitude behaviour for them).

Test results with known test data are relatively easy to extrapolate 
from, and if your test data are reasonably representative of live data 
then so will your performance estimates.

Anyway, aren't you more interested in average behaviour than worst-case? 
Most people are.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/




More information about the Python-list mailing list