String Concatenation O(n^2) (was: Re: Explaining Implementing a Binary Search Tree.)

Bart Kastermans bkasterm at gmail.com
Mon Jun 16 06:34:06 EDT 2008


Summary: can't verify big O claim, how to properly time this?

On Jun 15, 2:34 pm, "Terry Reedy" <tjre... at udel.edu> wrote:
> "Bart Kastermans" <bkast... at gmail.com> wrote in message
>
> news:ae91857d-ea63-4204-9fc3-251049adee98 at k13g2000hse.googlegroups.com...
> |I wrote a binary search tree in python, explaining as I was doing it
> | how and why I did it.  I am very interested in receiving comments on
> | the code, process, and anything else that will improve my coding or
> | writing.
> |
> | I wrote this all up in my blog at:
> |
> |http://kasterma.wordpress.com/2008/06/15/implementing-a-binary-search...
> |
> | The code of the class has been copied below, but the description of
> | the process (mostly an attempt at approaching test driving development
> | for as far as I understand the term) has not been copied.
> |
> | Any and all comments are appreciated.
> |
> | Best,
> | Bart
> |
> | *********** python code ************************
> |
> |
> | import re
> |
> | class BSTree:
> |    def __init__ (self, value = None):
> |        self.value = value
> |        self.left = None
> |        self.right = None
>
> There are two possible approaches.
> 1. Define 1 tree class that is also used for subtrees -- what you did.
>    Complication: self.value of root node can be none, so you constantly
> have to check self.value for None even though only possible for root node.
> 2. Define tree class and node class.  This had other complications, but
> removes that above and makes str easier.  tree.str = '(' str(rootnode) ')'
> and node.str= self.value '(' str(self.left) ')' '(' str(self.right) ')'.
>
> If use '' instead of None, no conditionals are needed.  (This might apply
> partly to your version as well.)  Or define class NullTree with a singleton
> instance with no attributes and a str method returning '' and an inOrder
> method yielding nothing.  This would also eliminate the condifionals in the
> inOrder method.  Not sure what is best.  With a good test suite, it is easy
> to make sure alternative implementations 'work' before testing for speed.

Thanks for the idea.  I would expect the separation to lead to
somewhat more
code, but all the "checking the root" code would be separated out in
the
tree class.  The node class would be very smooth.  I'll try this when
I have
some time (today I spend my "alloted" programming time on what is
below).

(also the comment about inOrder returning a generator was because I
tried to
figure it out, failed, and then got enough by doing it without yield.
I forgot
to bring my comment in line with my code.  A generator
would certainly be nicer, and I'll work at understanding your
suggestion for
it.)

>
> |    def __str__ (self):
>
> string appending is an O(n**2) operations.  The usual idiom, applied here,
> would be slist = ['('], slist.append(str(self.value)), .... return
> ''.join(slist).  In other words, collect list of pieces and join at end.

This is interesting.  I had never attempted to verify a big O
statement
before, and decided that it would be worth trying.  So I wrote some
code to
collect data, and I can't find that it goes quadratic.  I have the
graph
at

http://kasterma.wordpress.com/2008/06/16/complexity-of-string-concatenation/

It looks piecewise linear to me.

The code I used to collect the data is as follows:

*************************************************************************

import time

NUMBER = 100   # number of strings to concatenate at given length
JUMP = 500     # jump (and start length) of length of strings
END = 100001    # longest length string considered

def randomString (length):
    """ returns a random string of letters from {a,b,c,d} of length
"""

    string = ""

    for i in range (0,length):
        string += choice ("abcd")

    return string

def randomStrings (number, length):
    """ returns an array of number random strings all of length """

    array = []

    for i in range (0, number):
        array.append (randomString (length))

    return array

TimingData = []

for length in range (JUMP, END, JUMP):
    array1 = randomStrings (NUMBER, length)
    array2 = randomStrings (NUMBER, length)

    starttime = time.clock ()
    for i in range (0, NUMBER):
        string = array1 [i] + array2 [i]
    endtime = time.clock ()
    print "length", length, "done"

    TimingData.append ([length, 1000* (endtime-starttime)])
        # to get a better looking graph multiply by 1000

sagefile = open ('stringConcatGraph.sage', "w")
sagefile.write ("points =" + str (TimingData) + "\n")
sagefile.write ("graph = line (points)\n")
sagefile.write ("graph.show ()\n")
sagefile.close ()




More information about the Python-list mailing list