When Good Regular Expressions Go Bad
John Mitchell
johnm at magnet.com
Fri Oct 1 13:51:57 EDT 1999
On Fri, 1 Oct 1999, Skip Montanaro wrote:
> Yes, I should have given example re's (parental guidance suggested):
>
> First, the real thing:
>
> r"(?P<m>\s*(?P<date>\d+/\d[-,\d]*/\d+),?"
> "(?P<venue>[^,]+),"
> "(?P<city>[^,]+),"
> "(?P<state>[A-Za-z\s]+),"
Regex's, to coin a phrase, suck. I'd suggest breaking out your REs a
little more, and to add comments or examples to each segment. Also learn
the non-greedy match syntax, which is *VERY* useful.
-----
import re
x=re.compile(
r""
# Name Body whitespace or delimiter
"(?P<m>" "\s*"
"(?P<date>" "\d+/\d[-,\d]*/\d+" "),\s*" # ex: "10/1/1999"
"(?P<venue>" ".+" "),\s*" # Knitting Factory
"(?P<city>" ".+" "),\s*" # New York
"(?P<state>" "[A-Za-z\s]+" ")" # NY
")"
)
arg = x.match('10/1/1999, Knitting Factory, New York, NY')
print arg.group('state')
-----
Note the "venue" and "city" no longer mention the trailing comma -- the RE
is cleaner and more flexible. Also added optional whitespace after each
segment.
hope dis helps!
- j
More information about the Python-list
mailing list