pattern matching

Roy Smith roy at panix.com
Wed Feb 23 23:07:30 EST 2011


In article <mailman.364.1298517901.1189.python-list at python.org>,
 Chris Rebert <clp2 at rebertia.com> wrote:

> regex = compile("(\d\d)/(\d\d)/(\d{4})")

I would probably write that as either

r"(\d{2})/(\d{2})/(\d{4})"

or (somewhat less likely)

r"(\d\d)/(\d\d)/(\d\d\d\d)"

Keeping to one consistent style makes it a little easier to read.  Also, 
don't forget the leading `r` to get raw strings.  I've long since given 
up trying to remember the exact rules of what needs to get escaped and 
what doesn't.  If it's a regex, I just automatically make it a raw 
string.

Also, don't overlook the re.VERBOSE flag.  With it, you can write 
positively outrageous expressions which are still quite readable.  With 
it, you could write this regex as:

r" (\d{2}) / (\d{2}) / (\d{4}) "

which takes up only slightly more space, but makes it a whole lot easier 
to scan by eye.

I'm still going to stand by my previous statement, however.  If you're 
trying to parse HTML, use an HTML parser.  Using a regex like this is 
perfectly fine for parsing the CDATA text inside the HTML <td> element, 
but pattern matching the HTML markup itself is madness.



More information about the Python-list mailing list