?Module re documentation bug, error, or misunderstanding?

David LeBlanc whisper at oz.net
Fri Jul 26 17:38:31 EDT 2002


<snip>
> So, what am I missing below as both searches "should" succeed?
>
> % python2.1
> Python 2.1 (#2, Apr 24 2001, 11:33:06)
> [GCC 2.95.3 20010315 (release)] on hp-uxB
> Type "copyright", "credits" or "license" for more information.
> >>> import re
> >>> re.search("(.+) \1", '55 55')
> >>>
> >>> re.search("(.+) (.+)", '55 55')
> <SRE_Match object at 40098328>

Your quantifiers are greedy, so the first + matches the whole string. "\1"
is "55 55", not the "55" you expect. Try it with a non-greedy quantifier:
+? -> r"(.+?) \1" or r"(.+?) (.+?)". (BTW, it's a good habbit to always make
re strings raw: r"".)

Of course, this might not work either - it's asking a lot to want an "any
character" specifier with a "one or more" quantifier to magically decide
that it should stop on a space. I think I've gotten this to work by using \s
(or \s*) in preference to a litteral space following a .* or .+ match
element.

David LeBlanc
Seattle, WA USA





More information about the Python-list mailing list