question on string object handling in Python 2.7.8

Thu Dec 25 11:34:26 EST 2014

On Tue, 23 Dec 2014 20:28:30 -0500, Dave Tian wrote:

> Hi,
> 
> There are 2 statements:
> A: a = ‘h’
> B: b = ‘hh’
> 
> According to me understanding, A should be faster as characters would
> shortcut this 1-byte string ‘h’ without malloc; B should be slower than
> A as characters does not work for 2-byte string ‘hh’, which triggers the
> malloc. However, when I put A/B into a big loop and try to measure the
> performance using cProfile, B seems always faster than A.
> Testing code:
> for i in range(0, 100000000):
> 	a = ‘h’ #or b = ‘hh’
> Testing cmd: python -m cProfile test.py
> 
> So what is wrong here? B has one more malloc than A but is faster than
> B?

Your understanding.

The first time through the loop, python creates a string object "h" or 
"hh" and creates a pointer (a or b) and assigns it to the string object.

The next range(1, 100000000) times through the loop, python re-assigns 
the existing pointer to the existing string object.

Maybe a 2 character string is faster to locate in the object table than a 
1 character string, so that in the 2 character case, the lookup is faster.

-- 
Denis McMahon, denismfmcmahon at gmail.com