Two RE proposals

Fredrik Lundh fredrik at pythonware.com
Sat Jul 27 03:48:47 EDT 2002


David LeBlanc wrote:
> 1. Add a substitution operator - in the example below it's "!<..>"
>
> word = r"\w*"
> punct = r"[,.;?]"
> wordpunct = re.compile(r"!<word>!<punct>")
>
> The re compiler sees r"\w*[,.;?]"
> Trivial example, but for fancier patterns it would be great IMO.
> A substitution pass should be done over the substituted text for nesting:

python already has string substitution.  if it needs better string
substitution, that should be solved outside the RE engine.

besides, having library modules peek in your local namespace is
really bad style.

and your proposal will break existing code.

:::

the following approach works in all existing versions of Python,
gives you syntax highlighting in all existing Python editors, etc:

    def i(*args):
        return string.join(map(str, args))

    word = r"\w*"
    punct = r"[,.;?]"
    wordpunct = re.compile(i(word, punct))

    if = r"if"
    term = r"something"
    num = r"\d*"
    op = r"[-+*/]"
    factor = i(num, "\s*", op, "\s*", num)
    expr = i(term, factor)
    if_stmt = re.compile(i(if, "\s*\(?\s*", expr, "\s*\)?\s*:"))

if you're doing lots of RE stuff, you can trivially extend this to
support RE-oriented operations:

    if = literal("if")
    op = set("-+*/")
    factor = seq(num, ws, op, ws, num)

(google for "rxb" for a complete implementation of that idea)

> 2. Make r"(a|b)*" mean any number of a's or b's.

it does mean any number of a's or b's.  but no more than a
single a or b will end up in the group.

> This doesn't work, at least in some situations with the current
> re compiler - the "any" op "*" doesn't seem to span over a parened
> group

    for i in range(20):
        s = file.read(1)

doesn't give you a 20 character string either (nor a 20 item list)

fixing the read statement is of course trivial.

fixing the RE is done in a similar fashion: make sure the group
matches everything you want to put in the group:

    r"((?:(a|b)*)"

if you want lists of matching things, use findall.

</F>





More information about the Python-list mailing list