New Python regex Doc

Skip Montanaro skip at pobox.com
Sun May 8 21:33:05 EDT 2005


    Peter> And which, at least implicitly, defines "greedy" by in section
    Peter> 6.3 titled "Greedy versus Non-Greedy".  It's not perfect, but
    Peter> then nobody in this thread has offered anything even remotely
    Peter> resembling perfect documentation for regular expressions
    Peter> yet. <wink>

In the re syntax page:

    http://www.python.org/dev/doc/devel/lib/re-syntax.html

the *?, +? and ?? operators *, + and ? are described as greedy:

    *?, +?, ??
        The "*", "+", and "?" qualifiers are all greedy; they match as much
        text as possible. Sometimes this behaviour isn't desired; if the RE
        <.*> is matched against '<H1>title</H1>', it will match the entire
        string, and not just '<H1>'. Adding "?" after the qualifier makes it
        perform the match in non-greedy or minimal fashion; as few
        characters as possible will be matched. Using .*? in the previous
        expression will match only '<H1>'.

{m,n}? is also described as a non-greedy version of {m,n} and A|B is
described as never being greedy (if A matches, B is never tried).  Perhaps
there's no explicit definition of the word "greedy" in the context of
regular expressions, but I think that after reading that page most people
will at least have an intuitive notion of the meaning.  If it's still
unclear, a little experimentation should suffice:

    >>> import re
    >>> re.match("(a+)", "aaaaa").group(1)
    'aaaaa'
    >>> re.match("(a+?)", "aaaaa").group(1)
    'a'

In short, I think the re docs are fine as-is w.r.t. the greedy concept.  I
also added a definition to the Python Glossary for good measure:

    http://www.python.org/moin/PythonGlossary

Feel free to amend/enhance/correct as you see fit.  (Feel free to flesh out
any definitions for that matter, especially those with "???" as the
definition.)

Skip



More information about the Python-list mailing list