O(n^2) is bad - can it be fixed?

Fri May 25 16:17:44 EDT 2001

[Courageous]
> The use of the word "millions" is hyperbolous, yes?

Depends on the app, of course.  For example, you only need one dict mapping a few million keys to lists of associated
items to get a few million lists.  While not "a typical app", that's typical of *some* reasonable apps.

> I just can't imagine that adding a 32 bit number to every list in
> Python is all that big a deal.

Adding a new data member to a builtin type is always a big deal, and has gotten bigger lately due to the emerging
popularity of Python ports to small devices.  In any case, Guido rejected this idea the last time it went around (hence
my "no chance" -- nothing material has changed since then, and this is repetition of arguments made many times before).

I'm going to try to stop the Win9X madness via a no-new-data hack, against the day we can do it right (via PyMalloc).
BTW, I'm actually more appalled that the current ROUNDUP macro uses integer division (a very slow operation on some
boxes).

> Do you have actual numbers here?

Every you time you run a Python program, it emails detailed runtime statistics to Guido via a secret backdoor in the
code.  And that's the real reason list.append() eventually slows down, but only on systems like Windows with poor email
performance <wink>.

> ...
> And there are alternatives; I can't be the only one to have written a
> list replacement.

I don't recall another list replacement implemented *as* a list.  More popular is to try some flavor of balanced tree,
or a skip list.

> ...
> The other day I modded my Python distro to do postmortem's on all
> Python dictionaries after a run of the pystone suite; this forced
> me to keep them from ever being destroyed

If you do this again, write the info you want to analyze to a text file (opened for append) at the start of
dict_dealloc().  Then you don't need to keep the dicts alive, and can write Python programs to analyze the text-file
dumps later.  For example, write out the dict size, and for those slots that aren't empty write the index and key hash
code (special-casing dummies).

BTW, pystone makes no attempt to model data usage:  it's an artificial benchmark from the Drhystone line, pasting
together senseless code contrived to recreate the aggregate operation-count statistics obtained from traces of "real
programs".  In pystone's case, it's a recoding in Python of a recoding in C of a synthetic Ada program that sought to
reproduce the relative operation counts across traced runs of some low-level integer systems programs.  Its relation to
"typical Python programs" is thus non-existent:  it makes no use of classes except to fake a C struct, makes heavy use
of module globals, and doesn't even use Python's "for" loop except to simulate Ada's integer-counter loop.  It makes no
explict use of dicts at all, and uses lists only to fake fixed-size C arrays; etc.  The best that can be said is that
what it measures *can* be measured <0.9 wink -- and that's useful, provided you don't read much into it>.