sorting 1172026 entries

Cameron Simpson cs at zip.com.au
Mon May 7 17:52:24 EDT 2012


On 07May2012 11:02, Chris Angelico <rosuav at gmail.com> wrote:
| On Mon, May 7, 2012 at 10:31 AM, Cameron Simpson <cs at zip.com.au> wrote:
| > I didn't mean per .append() call (which I'd expect to be O(n) for large
| > n), I meant overall for the completed list.
| >
| > Don't the realloc()s make it O(n^2) overall for large n? The list
| > must get copied when the underlying space fills. I confess to being
| > surprised at how quick it went for me though. I suppose reallocing in
| > chunks (eg to double the available size) might conceal this for a while.
| > It should still be well over O(n) overall (not amortised per .append()
| > call).
| 
| I haven't checked the CPython code, but the usual way to do
| reallocations is to double the size

My example above:-)

| (or add 50% or something) each
| time, meaning that as n increases, the frequency of reallocations
| decreases - hence the O(1) amortized time.

Hmm, yes. But it is only O(1) for doubling. If one went with a smaller
increment (to waste less space in the end case where one stops growing the
array) then there are more copies and less .append efficiency, trading
potential memory bloat for compute time in .append(). If you went all
the way to adding only one item the cost goes to O(n) for .append()
and O(n^2) overall, with varying costs in between.

| In any case, it's the
| recommended way to build large strings:
| 
| s = ""
| while condition:
|     s += "some_value"
| 
| versus:
| 
| s = []
| while condition:
|     s.append("some_value")
| s = "".join(s)
| 
| so I would be very much surprised if this common wisdom is merely
| replacing one O(n*n) operation with another O(n*n) operation.

It sort of is (except that list.append may be specially tuned to realloc
in big chunks). The join-lots-of-strings standard example is based on
the length of the strings in characters/bytes being far more than the
number of strings. So you do a lot of tiny listappends instead, copying
nothing, then a big allocate-the-correct-size-and-copy-once with s.join.

Also, this:

  s += "some_value"

is not a list extension. Because strings are immutable (and anyway in
general for "+" in python) you're making a new string and copying two
other string contents into it. Inherently a copy of the whole of "s" on
every "+" operation. Compared with list.append() or list.extend(), which
are in-place modifications to the original list.

So these are really apples and oranges here.
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

We are in great haste to construct a magnetic telegraph from Maine to Texas;
but Maine and Texas, it may be, have nothing important to communicate.
        - H. D. Thoreau



More information about the Python-list mailing list