How smart is the Python interpreter?

Thu Jul 31 18:58:25 EDT 2008

On Thu, 31 Jul 2008 04:09:57 -0700, ssecorp wrote:

> def str_sort(string):
> 	s = ""
> 	for a in sorted(string):
> 		s+=a
> 	return s
> 
> 
> if i instead do:
> 
> def str_sort(string):
> 	s = ""
>         so = sorted(string)
> 	for a in so:
> 		s+=a
> 	return s
> 
> 
> will that be faster 

Oh dear. Premature optimization, the root of all (programming) evil.

You can test which is faster by using the timeit module. In the 
interactive interpreter, define the two functions above with different 
names, and a string to supply as argument. Then call:

from timeit import Timer
t1 = Timer('str_sort1(s)', 'from __main__ import str_sort1, s')
t2 = Timer('str_sort2(s)', 'from __main__ import str_sort2, s')
t1.repeat(number=1000)
t2.repeat(number=1000)

I'll be hugely surprised if there was any meaningful difference.

> or the interpreter can figure out that it only has
> to do sorted(string) once? or that kind of cleverness is usually
> reserved for compilers and not interpreters?

Python uses a compiler. It doesn't do a lot of clever optimization, but 
it does some. In this case, no, it doesn't optimize your function, so 
technically the first may be a tiny bit faster. But, frankly, your 
function is so painfully inefficient, doing a lot of useless work, that 
you probably won't notice any real difference.

The correct way to do what you have done above is ''.join(sorted(s)). 
Anything else is much slower.

>>> def sort_str(s):
...     ss = ""
...     for c in sorted(s):
...             ss += c
...     return ss
...
>>> s = "abcdefghijklmnopqrstuvwxyz"*100
>>> from timeit import Timer
>>> t1 = Timer('"".join(sorted(s))', 'from __main__ import s')
>>> t2 = Timer('sort_str(s)', 'from __main__ import sort_str, s')
>>> t1.repeat(number=1000)
[1.6792540550231934, 1.6882510185241699, 1.660383939743042]
>>> t2.repeat(number=1000)
[2.5500221252441406, 2.4761130809783936, 2.5888760089874268]

-- 
Steven