When Good Regular Expressions Go Bad

John Mitchell johnm at magnet.com
Fri Oct 1 13:51:57 EDT 1999


On Fri, 1 Oct 1999, Skip Montanaro wrote:

> Yes, I should have given example re's (parental guidance suggested):
> 
> First, the real thing:
> 
>     r"(?P<m>\s*(?P<date>\d+/\d[-,\d]*/\d+),?"
>      "(?P<venue>[^,]+),"
>      "(?P<city>[^,]+),"
>      "(?P<state>[A-Za-z\s]+),"


Regex's, to coin a phrase, suck.  I'd suggest breaking out your REs a
little more, and to add comments or examples to each segment. Also learn
the non-greedy match syntax, which is *VERY* useful.

-----
import re

x=re.compile(
    r""
    # Name              Body                    whitespace or delimiter 
    "(?P<m>"                                    "\s*"
    "(?P<date>"         "\d+/\d[-,\d]*/\d+"     "),\s*" # ex: "10/1/1999"
    "(?P<venue>"        ".+"                    "),\s*" # Knitting Factory
    "(?P<city>"         ".+"                    "),\s*" # New York
    "(?P<state>"        "[A-Za-z\s]+"           ")" # NY
    ")"
    )

arg = x.match('10/1/1999, Knitting Factory, New York, NY')
print arg.group('state')
-----


Note the "venue" and "city" no longer mention the trailing comma -- the RE
is cleaner and more flexible.  Also added optional whitespace after each
segment.

hope dis helps!


- j






More information about the Python-list mailing list