[Python-ideas] FInd first tuple argument for str.find and str.index

Ron Adam rrr at ronadam.com
Thu Sep 6 00:59:41 CEST 2007



Terry Jones wrote:

 > It's worse than that. _Each time around the loop_ it tests all candidates
 > against all of the remaining text. Imagine matching L patterns that were
 > each 'a' * M against a text of 'a' * N. You're going to do (roughly) O(N *
 > L * M) comparisons, using naive string matching.

Yep, it's bad, which is why I don't want to do it this way.  Of course it's 
much better than ...

     for char in string:
         ... etc ...


 > You could completely drop patterns that have already returned -1. You can
 > short-circuit a loop if you ever got a 0 index back.  Sorry if this seems
 > picky - I guess you're just writing quick pseudo-code.

Well its a but if both pseudo-code and from an actual problem I was working 
on.  It was a first unoptimized version.  First rule is to get something 
that works, then make it fast after it's tested.  Right?  :-)

No you aren't being any more picky than I am.  I didn't like that solution 
either which is why I suggested improving index and find just a bit.

And I'm not real keen on this next one although it will probably work better.

(not tested)

length = len(s)
cases = dict((c, 0) for c in terms)
while 1:
    for term, i in cases.items():
        if i == start:
           i = s.find(term, start)
           cases[term] = i if i > -1 else length
    start = min(cases.values())
    if start == length:
        break
    ...
    Do something with s[start]

Return some result we constructed or found along the way.

That's an improvement, But it's still way more work than I think the 
problem should need.  It's also complex enough that it's no longer obvious 
just what it's doing.


I still like the tuple version of index, or find better.  Much easier to 
read and understand.

    ...
    try:
        start = s.index(('{', '}'), start)
    except:
        break
    ...

    ...
    start = s.find(('{', '}'), start)
    if start == -1:
        break
    ...


 > Ron> Of course I would use 're' for anything more complex than a few fixed
 > Ron> length terms.
 >
 > Yes. It depends partly on what you want back. You could write a super-fast
 > iterator based on A&C that told you everything you need to know, with
 > guaranteed linear worst case behavior, but that seems like overkill here
 > (and it may be in re, as you say).

Regular expression can be slower for small things because it has more 
overhead.  It's not always the fasted choice.

And I've read they are not good at is parsing nested delimiters.

Cheers,
    RON


 > Ron> If the function returns something other than a simple index, then I
 > Ron> think it will need to be a new function or method and not just an
 > Ron> alteration of str.index and str.find.  In that case it may also need a
 > Ron> PEP.
 >
 > Be my guest :-)
 >
 > Regards,
 > Terry



More information about the Python-ideas mailing list