O(n^2) is bad - can it be fixed?

Wed May 30 02:01:05 EDT 2001

Tim Peters wrote:

> [Isaac To Kar Keung]
> > ...
> > Yes, it rarely happens.  But is the solution really so difficult
> > that we have any excuse not doing it right?
>
> You need to define "the problem" first.  I'm really not clear on what it is
> you think needs to be solved!  If you define it carefully, then I wager
> you'll discover, e.g., that Bruce Dawson's complaint cannot be solved on
> Win9x even if you use an explicit capacity field and do growth by doubling
> (he doesn't realize that yet either).  It can be *delayed*, but Win9x VM
> management is such that address space will still get fragmented beyond hope
> unless Python also takes over allocating heaps.  This gets absurd.

If you use the list size doubling technique (no, I haven't tried it with Python,

but I'm assuming you haven't either, so my guess about what will happen is
as valid as yours :-) then your list will grow within a single
heap until it reaches the default size of a heap - typically 4 Mbytes. This is
fundamentally unchanged from the current behaviour.

Then, instead of allocating a heap that is 4 Mbytes + 64K, an 8 Mbyte heap
will be allocated, because Python will ask for roughly 8 Mbytes of memory.

Then Python will ask for 16 Mbytes, then 32, then 64, then 128, then 256,
and then 512 Mbytes of memory. Each time, Win9x will happily allocate
a heap of the requested size. When Python asks for a 1024 Mbyte heap
then it will probably fail, because the address space will almost certainly
be too fragmented. It may even fail on the 512 Mbyte allocation. However,
if it makes it to the 256 Mbyte allocation, which seems very likely,
then it will have succeeded in making a list that is approximately 32 times
longer than when it previously failed. On some machines, it might even
run out of memory before it runs out of address space.

If you define the problem as being able to use all 2 Gbytes of address space
for a single list then no, doubling the list size doesn't solve it. If you
define
the problem as being able to use lists that are much less constrained, then
the list doubling method works famously. It also should substantially improve
the performance on WinNT and Win2K, where the behaviour is quadratic.

As well as allowing much larger lists to be created much more quickly (no
O(n^2) behaviour) it also leaves memory much less fragmented. Address
space will be allocated to seven or eight heaps, instead of 500. Those
heaps will be available for allocations of any size up to 128 Mbytes or so
(I believe they will be available - I have not verified that).

As to whether the problem is real or not, remember that this thread started
with my experience with solving a *real* problem with strings, which
performed badly in the way I was using them. I figured out why that was
so and switched to lists. Now I have found out that my program will
behave badly with lists also, if I happen to try processing a 'large' text
file - where large is defined as a Mbyte or two (performance degrades
to 25% of array speed after one million appends on WinNT, and
runs out of address space after 1.5 million appends on Win9x).

Now obviously I can fix this problem - switch from list to array, or to
cStringIO. I'm glad that Python gives me these options. However after
years spent lecturing my coworkers on the importance of algorithm
analysis it's surprising to find what seems to be unnecessary quadratic
behavior in Python. Also, many people will not realize when they need
to make the switch.

As Guido must know (from the statistics that are e-mailed to him :-) there
are a lot of programs out there encountering the edges of these conditions
that are running slowly without their authors realizing why. Or, programs
that are written to process small files, and occasionally are used on large
ones - to bad effect. Or, programs that are written on Unix and then
run on NT or Win2K or WinXP or Win9x or some other platform whose
allocator doesn't hide the quadratic behaviour of lists.

Tim Peters wrote (in another message):

> [Bruce Dawson]
> > After discoverting that lists actually are O(n^2),
>
> "Actually" is an odd word for something that took two days to provoke after
> giving up the first time then being assured it was possible to provoke it.
> Is it "actually" the case on Linux too?  As has been said, the behavior is
> platform dependent.

It didn't take two days to provoke the problem on Win9x - it took about
five minutes, using the text processing script I had written, on a slightly
larger (double real size) text file. I then simplified things for posting.

I certainly agree that the behavior is platform dependent - that is the whole
crux of the problem (if it is a problem). The Unix allocators happen to be
very good at dealing with this sort of memory allocation pattern, therefore
they *hide* the quadratic behavior. The only sin of the WinNT allocator is
that it fails to paper over quadratic behavior in Python. If the list allocation

scheme grew exponentially (which I now understand is not as trivial to
add to Python as I had thought) then the behavior *wouldn't* be (as)
platform dependent, and scripts would be more portable.

Now, I will try to resist the urge to comment further until I have more
concrete evidence that the problem is real, or a solution, or both. I suspect
that the problem is more serious than the experts believe, but I can't
prove it.

> In the meantime, I did check in a change (for 2.2) that continues to do mild
> over-allocation, but on a sliding scale that doesn't think the world ends at
> 500 elements.  This speeds real-life appends, but mostly because it gets rid
> of the current integer multiplication and division in favor of simple bit
> shifts.  For Bruce's kind of test, it delays the point at which
one-at-a-time list.append(x) exhausts Win9x VM from about 2 million elements
to about 35 million.  I have higher hopes for NT/Win2K, but haven't yet
tested it there.

Tim Peters wrote, in yet another message:

> In the meantime, I did check in a change (for 2.2) that continues to do mild
> over-allocation, but on a sliding scale that doesn't think the world ends at
> 500 elements.  This speeds real-life appends, but mostly because it gets rid
> of the current integer multiplication and division in favor of simple bit
> shifts.  For Bruce's kind of test, it delays the point at which
> one-at-a-time list.append(x) exhausts Win9x VM from about 2 million elements
> to about 35 million.  I have higher hopes for NT/Win2K, but haven't yet
> tested it there.

Cool! I just saw this one. That will definitely help. A factor of 16 improvement

in maximum size (and similar improvement in performance) will make any
remaining issues increasingly theoretical. Can you post your new code snippet?

I guess my only (slightly wrinkled nose?) comment would be that it seems like
Python lists are acquiring an exponential growth factor (which is a good thing,
it solves the quadratic performance) the Rube Goldberg way - one special case
at a time.

If I have a better solution I'll post it.

Bruce/Helen Dawson