String Concatenation O(n^2) (was: Re: Explaining Implementing a Binary Search Tree.)

Ian Kelly ian.g.kelly at gmail.com
Mon Jun 16 12:41:05 EDT 2008


On Mon, Jun 16, 2008 at 4:34 AM, Bart Kastermans <bkasterm at gmail.com> wrote:
> This is interesting.  I had never attempted to verify a big O
> statement
> before, and decided that it would be worth trying.  So I wrote some
> code to
> collect data, and I can't find that it goes quadratic.  I have the
> graph
> at
>
> http://kasterma.wordpress.com/2008/06/16/complexity-of-string-concatenation/
>
> It looks piecewise linear to me.

I don't think there's any question that it's quadratic.  Just look at
the C code, and you'll see that every time you concatenate the entire
string has to be copied.  Remember that the copy code uses memcpy from
the C library, though, so it's quite fast.  If you're not seeing it
for relatively small strings, it's probably that the major time
component is the Python looping construct, not the copy operation.

I tried it at lengths of multiples of 10, and it remained linear up
until 1E6.  At 1E7, it appears to turn quadratic.  I tried 1E8, but it
hasn't finished yet.  I expect it to take the better part of an hour.

>>> from timeit import Timer
>>> a = Timer("a += 'a'", "a = ''")
>>> a.timeit(10)
6.9841280492255464e-006
>>> a.timeit(100)
2.7936511287407484e-005
>>> a.timeit(1000)
0.00043525084856810281
>>> a.timeit(10000)
0.0049266038004134316
>>> a.timeit(100000)
0.046758456731367914
>>> a.timeit(1000000)
0.38496261396358022
>>> a.timeit(10000000)
20.320074199374631

Even though it doesn't become quadratic until 1E7, it's still up to an
order of magnitude faster to use str.join for smaller strings, though:

>>> b = Timer("''.join(['a'] * 1000000)")
>>> b.timeit(1)
0.044094989726346512

-Ian



More information about the Python-list mailing list