best way to replace first word in string?

Steven D'Aprano steve at REMOVETHIScyber.com.au
Sat Oct 22 23:26:32 EDT 2005


On Sat, 22 Oct 2005 14:54:24 -0400, Mike Meyer wrote:

>> The string formatting is two orders of magnitude faster than the
>> concatenation. The speed difference becomes even more obvious when you
>> increase the number of strings being concatenated:
> 
> The test isn't right - the addition test case includes the time to
> convert the number into a char, including taking a modulo.

I wondered if anyone would pick up on that :-)

You are correct, however that only adds a constant amount of time to
the time it takes for each concatenation. That's why I talked about order
of magnitude differences. If you look at the vast increase in time taken
for concatenation when going from 10**5 to 10**6 iterations, that cannot
be blamed on the char conversion.

At least, that's what it looks like to me -- I'm perplexed by the *vast*
increase in speed in your version, far more than I would have predicted
from pulling out the char conversion. I can think of three
possibilities:

(1) Your PC is *hugely* faster than mine;

(2) Your value of x is a lot smaller than I was using (you don't actually
say what x you use); or

(3) You are using a version and/or implementation of Python that has a
different underlying implementation of string concatenation.



> I couldn't resist adding the .join idiom to this test:

[snip code]

>>>> tester(x)
> 0.0551731586456 0.0251281261444 0.0264830589294
>>>> tester(x)
> 0.0585241317749 0.0239250659943 0.0256059169769
>>>> tester(x)
> 0.0544500350952 0.0271301269531 0.0232360363007
>
> The "order of magnitude" now falls to a factor of two. The original
> version of the test on my box also showed an order of magnitude
> difference, so this isn't an implementation difference.

[snip]

>>>> tester(x * 10)
> 1.22272014618 0.252701997757 0.27273607254
>>>> tester(x * 10)
> 1.21779584885 0.255345106125 0.242965936661
>>>> tester(x * 10)
> 1.25092792511 0.311630964279 0.241738080978

Looking just at the improved test of string concatenation, I get times
about 0.02 second for n=10**4. For n=10**5, the time blows out to 2
seconds. For 10**6, it explodes through the roof to about 2800 seconds, or
about 45 minutes, and for 10**7 I'm predicting it would take something of
the order of 500 HOURS.

In other words, yes the char conversion adds some time to the process, but
for large numbers of iterations, it gets swamped by the time taken
repeatedly copying chars over and over again.


-- 
Steven.




More information about the Python-list mailing list