Pythonic way to determine if one char of many in a string

Wed Feb 18 02:55:41 EST 2009

On Wed, 18 Feb 2009 07:08:04 +1100, Jervis Whitley wrote:

>> This moves the for-loop out of slow Python into fast C and should be
>> much, much faster for very large input.
>>
>>
> _Should_ be faster.

Yes, Python's timing results are often unintuitive.

> Here is my test on an XP system Python 2.5.4. I had similar results on
> python 2.7 trunk.
...
> **no vowels**
> any: [0.36063678618957751, 0.36116506191682773, 0.36212355395824081]
> for: [0.24044885376801672, 0.2417684017413404, 0.24084797257163482]

I get similar results.

...
> **BIG word vowel 'U' final char**
> any: [8.0007259193539895, 7.9797344140269644, 7.8901742633514012] for:
> [7.6664422372764101, 7.6784683633957584, 7.6683055766498001]

Well, I did say "for very large input". 10000 chars isn't "very large" -- 
that's only 9K. Try this instead:

>>> BIGWORD = 'g' * 500000 + 'U'  # less than 500K of text
>>>
>>> Timer("for_test(BIGWORD)", setup).repeat(number=1000)
[4.7292280197143555, 4.633030891418457, 4.6327309608459473]
>>> Timer("any_test(BIGWORD)", setup).repeat(number=1000)
[4.7717428207397461, 4.6366970539093018, 4.6367099285125732]

The difference is not significant. What about bigger?

>>> BIGWORD = 'g' * 5000000 + 'U'  # less than 5MB 
>>>
>>> Timer("for_test(BIGWORD)", setup).repeat(number=100)
[4.8875839710235596, 4.7698030471801758, 4.769787073135376]
>>> Timer("any_test(BIGWORD)", setup).repeat(number=100)
[4.8555209636688232, 4.8139419555664062, 4.7710208892822266]

It seems to me that I was mistaken -- for large enough input, the running 
time of each version converges to approximately the same speed.

What happens when you have hundreds of megabytes, I don't know.

-- 
Steven