Named regexp variables, an extension proposal.

Paul McGuire ptmcg at austin.rr._bogus_.com
Sun May 14 15:14:39 EDT 2006


"Paddy" <paddy3118 at netscape.net> wrote in message
news:1147631397.922734.167880 at u72g2000cwu.googlegroups.com...
> I have another use case.
> If you want to match a comma separated list of words you end up writing
> what constitutes a word twice, i.e:
>   r"\w+[,\w+]"
> As what constitues a word gets longer, you have to repeat a longer RE
> fragment so the fact that it is a match of a comma separated list is
> lost, e.g:
>   r"[a-zA-Z_]\w+[,[a-zA-Z_]\w+]"
>
> - Paddy.
>
Write a short function to return a comma separated list RE.  This has the
added advantage of DRY, too.  Adding an optional delim argument allows you
to generalize to lists delimited by dots, dashes, etc.

(Note - your posted re requires 2-letter words - I think you meant
"[A-Za-z_]\w*", not "[A-Za-z_]\w+".)
-- Paul


import re

def commaSeparatedList(regex, delim=","):
    return "%s[%s%s]*" % (regex, delim, regex)

listOfWords = re.compile( commaSeparatedList(r"\w+") )
listOfIdents = re.compile( commaSeparatedList(r"[A-Za-z_]\w*") )

# might be more robust - people put whitespace in the darndest places!
def whitespaceTolerantCommaSeparatedList(regex, delim=","):
    return r"%s[\s*%s\s*%s]*" % (regex, delim, regex)


# (BTW, delimitedList in pyparsing does this too - the default delimiter is
a comma, but other expressions can be used too)
from pyparsing import Word, delimitedList, alphas, alphanums

listOfWords = delimitedList( Word(alphas) )
listOfIdents = delimitedList( Word(alphas+"_", alphanums+"_") )


-- Paul





More information about the Python-list mailing list