Frankenstring

Thu Jul 14 07:10:27 EDT 2005

Andreas Lobinger wrote:

>  >>> t2 = f.find('2')+1

This is indeed faster than going through a string char by char. It doesn't
make for a nice character-based state machine, but of course it avoids
making Python objects for every character and uses the C implementation of
str for searching.

However, it's only fine if you are looking for single characters. As soon
as you're looking for classes of characters, you need the (slower) regex
machinery (as you well know, but for the sake of discussion...).

> A string, and a pointer on that string. If you give up the boundary
> condition to tell backwards, you can start to eat up the string via f =
> f[p:]. There was a performance difference with that, in fact it was faster
> ~4% on a python2.2.

When I tried it just now, it was the other way around. Eating up the
string was slower, which makes sense to me since it involves creating new
string objects all the time.

> I dont't expect any iterator solution to be faster than that.

It's not so much an issue of iterators, but handling Python objects
for every char. Iterators would actually be quite helpful for searching: I
wonder why there doesn't seem to be an str.iterfind or str.itersplit
thing. And I wonder whether there shouldn't be str.findany and
str.iterfindany, which takes a sequence as an argument and returns the
next match on any element of it.

-- 
Thomas