which is better, string concatentation or substitution?

Mon May 8 11:26:24 EDT 2006

John Salerno  <johnjsal at NOSPAMgmail.com> wrote:
>Roy Smith wrote:
>
>> One may be marginally faster, but they both require copying the source 
>> string, and are thus both O(n).
>
>Sorry, I'm not familiar with the O(n) notation.

OK, here's a quick tutorial to "big-oh" notation.  It's a way of
measuring algorithmic complexity.  The O stands for "on the Order
of".

For any algorithm, if you process n things (in the case of the strings
we're talking about, n would be the total number of characters in all
the strings), you can compute the number of steps it takes to complete
all the processing as some function of n.

For example, let's say a given algorithm took 4n^2 + 5n + 17 steps to
complete.  It doesn't take much experimentation to prove to yourself
that for all but the very smallest values of n, the constant 17 is
completely insignificant.  In fact, n doesn't have to get very big
before the 5n term becomes insignificant too.  To a reasonable
approximation, you could say that the algorithm takes 4n^2 steps and
be close enough.  For small values of n, this will be a bad
approximation, but it doesn't really matter for small values of n.
For large values of n (think hundreds, thousands or even millions),
it's just fine.

So, the first rule of O() notation is that when looking at how fast an
algorithm runs, you need only consider the highest order term
(i.e. the one with the biggest exponent).

But, it gets even better.  Let's compare two algorithms, the one
above, which takes (approximately) 4n^2 steps, and another one which
takes 10n steps, for varous values of n:

            n     10n    4n^2
           ---  ------  ------
            1     10       4
            2     20      16
            3     30      36
            4     40      64
            5     50     100
            6     60     144
            7     70     196
            8     80     256
            9     90     324
           10    100     400
           11    110     484
           12    120     576
           13    130     676
           14    140     784
           15    150     900
           16    160    1024
           17    170    1156
           18    180    1296
           19    190    1444
           20    200    1600

Notice that it doesn't take long for the fact that n^2 grows a lot
faster than n to swamp the fact that 10 is bigger than 4.  So the next
step in simplification is to say that not only don't the lower-order
terms matter, but the coefficient on the highest order term doesn't
even matter.  For a large enough n, all that matters is the exponent
(for the moment, I'm making a further simplification that all
algorithms can be described by polynomials with integer exponents).

So, we get things like:

O(n^0), which is almost always written as O(1).  This is a "constant
time" algorithm, one which takes the same amount of steps to execute
no matter how big the input is.  For example, in python, you can
write, "x = 'foo'".  That assignment statement takes the same amount
of time no matter how long the string is.  All of these execute in the
same number of steps:

	x = ''
	x = 'foo'
	x = 'a very long string with lots and lots of characters'

We can say that "assignment is constant time", or "assignment is
O(1)".

The next step up is O(n).  This is "linear time" algorithm.  It takes
a number of steps which is directly proportional to the size of the
input.  A good example might be "if 'x' in 'bar'".  The obvious way to
implement this (and, I assume, the way it is implemented in Python),
is to loop over the string, comparing 'x' to each character in 'bar',
one by one.  This takes as many steps as there are characters in the
string.

Next comes O(n^2), also knows as "quadratic time".  This means if your
input is of size n, the algorithm will take n^2 steps to run.
Quadratic algorithms are generally considered very slow, and to be
avoided if at all possible.

Once you get used to thinking like this, it's easy to look at a piece
of code and say, "oh, that's quadratic", or "that's linear", and
instantly know which is faster for some some large input.  And once
you've started thinking like that, you've made one of the big jumps
from thinking like a coder to thinking like a computer scientist (or,
if you prefer, like a software engineer).

Google "algorithmic complexity" or "big o notation" (perhaps spelled
"big oh") for more info.  I would normally recommend Wikipedia, but I
just took a quick look at their "Big O notation" article, and noticed
it's full of formal mathematical gobbledygook which just serves to
obscure what is really a fairly easy concept.