New Python regex Doc
Skip Montanaro
skip at pobox.com
Sun May 8 21:33:05 EDT 2005
Peter> And which, at least implicitly, defines "greedy" by in section
Peter> 6.3 titled "Greedy versus Non-Greedy". It's not perfect, but
Peter> then nobody in this thread has offered anything even remotely
Peter> resembling perfect documentation for regular expressions
Peter> yet. <wink>
In the re syntax page:
http://www.python.org/dev/doc/devel/lib/re-syntax.html
the *?, +? and ?? operators *, + and ? are described as greedy:
*?, +?, ??
The "*", "+", and "?" qualifiers are all greedy; they match as much
text as possible. Sometimes this behaviour isn't desired; if the RE
<.*> is matched against '<H1>title</H1>', it will match the entire
string, and not just '<H1>'. Adding "?" after the qualifier makes it
perform the match in non-greedy or minimal fashion; as few
characters as possible will be matched. Using .*? in the previous
expression will match only '<H1>'.
{m,n}? is also described as a non-greedy version of {m,n} and A|B is
described as never being greedy (if A matches, B is never tried). Perhaps
there's no explicit definition of the word "greedy" in the context of
regular expressions, but I think that after reading that page most people
will at least have an intuitive notion of the meaning. If it's still
unclear, a little experimentation should suffice:
>>> import re
>>> re.match("(a+)", "aaaaa").group(1)
'aaaaa'
>>> re.match("(a+?)", "aaaaa").group(1)
'a'
In short, I think the re docs are fine as-is w.r.t. the greedy concept. I
also added a definition to the Python Glossary for good measure:
http://www.python.org/moin/PythonGlossary
Feel free to amend/enhance/correct as you see fit. (Feel free to flesh out
any definitions for that matter, especially those with "???" as the
definition.)
Skip
More information about the Python-list
mailing list