Question about scientific calculations in Python

Thu Mar 14 00:31:22 EST 2002

On Wed, 13 Mar 2002 20:40:20 -0500, Tim Peters <tim.one at comcast.net> wrote:

>[Bengt Richter (maybe)]
nope, [Adrew Dalke (unless Bengt is mistaken)]
>> If your lists are really long (O(100,000) or more?) then you might find
>> that the append starts taking too long -- append is asymptotically
>> O(n**n) whoa! can that be true?
          ^-need break + original, so we get:
[Bengt Richter]
>whoa! can that be true? I hope not. It is very cheap to tie a singly linked
>list into a circle and point to the last node when pointing to the list as
a whole.
>
[Tim]
>No, theoretical worst case is O(n**2).  (1 + 2 + 3 + ... + n == ?)
>
>[Andrew Dalke]
>> Yes, it's true.
        ^^--O(n**n) or "It is very cheap ..." ? ;-)
>
>Only when the sky is falling too <wink>.
>
And we're slipping cogs in our timing belts ;-)

[Andrew continues]
>> This comes up on the list about once every six months or so.  As I
   ^^^^--referring to my 'it' UIAM ;-)
>> recall, Python's standard lists grow by 10 elements at a time for
>> small lists (less than 500 in length), then 100 at a time for larger
>> lists.
>
[Tim]
>That used to be true, but over-allocation is always proportional to list
>size now.  This had most benefit on assorted Windows flavors (which don't
>act alike in this respect).  It made no difference at all on some assorted
>Unixish systems (append often enough and it may get reassigned to its own
>mmap'ed region; or it may get moved to the sbrk limit so that further
>appends "just" bump the high-water VM mark without physical movement).
>
[AD]
>> ...
>> Insertions and deletions are also linear in the length of the list.
>
[Tim]
>They're linear in the length of the list following the insert/delete
>position.
>
Does that mean the whole tail is rebuilt? Is there any instrumentation
left over that logs access patterns? I wonder what real life uses show.
E.g. if you normalized index and slice references and used them as keys
in a dicionary of counts.

[Andrew continues]
>> Yes, we all know there are better ways to deal with this, for different
>> people's definition of the meaning "better."  It isn't going to change
>> in Python.
>
>Fast constant-time (modulo cache effects) random access is Python's chief
>goal for Python lists.
Do you cache a last-position hint in case of sequential access, or is that
not worth it?
>
[Andrew continues]
>> Best I think would be a standard module of data structures (priority
>> queue, doubly-linked lists, stack, etc.)  But it hasn't bothered
>> me enough to go ahead and do it.
>
>Guido doesn't go for this kind of thing:  the more "advanced" the data
>structure, the more specialized variants it sprouts, and there aren't enough
>volunteers to maintain the ever-bulging code across all the platforms Python
>runs on.
OTOH, the two or three most popular platforms probably have most all those things
in multiplicate (>duplicate ;-) already, so maybe just locating, describing, and
indexing them would work. Maybe we should just think up a <Python-wink> marker
and copyright/trademark it so people could put it in comment headers and google
could do the indexing ;-) If we choose globally unique category keywords and
get people to use them when contributing to this commons, then we could also
narrow searches. ##PSF##module ##PSF##category ##PSF##subcat or ?

Regards,
Bengt Richter