The Running Time of += on Char Strings ?

Edg Bamyasi support.services.complaints at gmail.com
Thu Mar 24 04:28:42 EST 2005


This Is A Late Cross Post from comp.lang.python. It seems the mistery
is deeper then i expected.

What is the running time of conactination on character strings.

i.e.

>> joe="123"
>> joe+="99999999999999999"

is it Amortized Constant time? I don't think it would be O((number of
chars)^2) but i really don't know.

Teach me how to fish, where would i find out more about the

internal
representations of data types in python (and guarenteed run times, im
think of something like sgi.com 's info on the STL) . I have looked
through the docs but i don't seem to see these types of specifications.

thanks * 100
- Haz

P.S.
- Should Note that i am famliure with timeit, but understanding the
underly data structures and representations is an important thing to
know.

P.P.S

This a bit of what i think relevent discourse i have been having via a
email responder of my usenet posting.

>     Haz> i should have mentioned that i am familure with the timeit
>     Haz> function, but shurly there must be a specification in the language
>     Haz> of the running time (number of flops).
>
> Nope.  I can't think of an instance where it would be appropriate to specify
> runtime properties of various algorithms in the language.  For example, if
> you were to specify that sorting of lists was O(n log n) that would
> potentially preclude the choice of quicksort as an algorithm because its
> worst case behavior is O(n * n) even though it is generally faster than most
> other sorting algorithms.

The answere here is to use omega(n log n) or specify average and worst
cases. I truely do think that there can be a complexity specification
for the language. I mean all algorithms have a complexity and surely
data structures are choosen with the size/speed tradeoffs in mind.

For instance in the STL has a sorting algorithm and it specifies a
running time the latter way (why they can do assure this is by using
coding with concepts, but i think in the base language it could be
simpler because the data structures are known.) i.e. Its know what
data types += works on and thus it should be know what algoritms are
to be used (that is the highly optimal ones)

>From SGI STL Page:
sort
<snip>
Complexity
O(N log(N)) comparisons (both average and worst-case), where N is last
- first. [2]
<snip>

source : http://www.sgi.com/tech/stl/sort.html


>     Haz> Without knowing these specifications its hard to optimize.
>
> No, you still need to see where your program runs slow and figure out ways
> to make it run faster.

Well basically my point is that it is hard to know why a code section
is running slow unless you understand the underlying data
represenations and algorithms

For instance

matlab code:

A=[]
for i=1:N
 A=[A;'a']
end

is O(N^2) operation

C code:

vector<char> A;
for(i=0; i<N; i++){
 A.push_back('a');
}

is Amortized O(N)   [note that this isn't the correct wording exactly,
but in general is O(N)]

Python Code:

A=""
for i in range(N):
  A+='a'

running time : ???

So if you are looking through your code in matlab or C and see this
concatination loop you know that it is a problem in the former and not
in the latter. But in python ???



More information about the Python-list mailing list