When Good Regular Expressions Go Bad
Skip Montanaro
skip at mojam.com
Fri Oct 1 14:19:24 EDT 1999
Me:
>> First, the real thing:
>>
>> r"(?P<m>\s*(?P<date>\d+/\d[-,\d]*/\d+),?"
>> "(?P<venue>[^,]+),"
>> "(?P<city>[^,]+),"
>> "(?P<state>[A-Za-z\s]+),"
John:
Regex's, to coin a phrase, suck. I'd suggest breaking out your REs a
little more, and to add comments or examples to each segment. Also learn
the non-greedy match syntax, which is *VERY* useful.
[proposed readability enhancements elided]
I agree re's suck. That's why in the one application where I use them *a
lot*, I wrote a higher level representation (the %{smonth}, %{city} stuff I
mentioned in my earlier post) that avoids most all of the line noise one
normally associates with re's. I'm not too concerned about the
maintainability of the ones I mentioned before, since I only have nine to
worry about (three patterns, each with a desired re and two simpler prefix
re's), and they haven't needed to be twiddled for a long time. Also, the
patterns I displayed were originally written using regex, not re.
(Actually, maybe it was pregex, I can't recall.) The thousands of patterns
I mentioned before are encoded in the more restrictive higher level
notation (which does "compile" to pregex re's).
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/
847-971-7098 | Python: Programming the way Guido indented...
More information about the Python-list
mailing list