When Good Regular Expressions Go Bad

Skip Montanaro skip at mojam.com
Fri Oct 1 14:19:24 EDT 1999


Me:

    >> First, the real thing:
    >> 
    >> r"(?P<m>\s*(?P<date>\d+/\d[-,\d]*/\d+),?"
    >> "(?P<venue>[^,]+),"
    >> "(?P<city>[^,]+),"
    >> "(?P<state>[A-Za-z\s]+),"

John:

    Regex's, to coin a phrase, suck.  I'd suggest breaking out your REs a
    little more, and to add comments or examples to each segment. Also learn
    the non-greedy match syntax, which is *VERY* useful.

    [proposed readability enhancements elided]

I agree re's suck.  That's why in the one application where I use them *a
lot*, I wrote a higher level representation (the %{smonth}, %{city} stuff I
mentioned in my earlier post) that avoids most all of the line noise one
normally associates with re's.  I'm not too concerned about the
maintainability of the ones I mentioned before, since I only have nine to
worry about (three patterns, each with a desired re and two simpler prefix
re's), and they haven't needed to be twiddled for a long time.  Also, the
patterns I displayed were originally written using regex, not re.
(Actually, maybe it was pregex, I can't recall.)  The thousands of patterns
I mentioned before are encoded in the more restrictive higher level
notation (which does "compile" to pregex re's).

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/
847-971-7098   | Python: Programming the way Guido indented...





More information about the Python-list mailing list