[Tutor] Making Regular Expressions readable

spir denis.spir at gmail.com
Mon Mar 8 18:34:05 CET 2010


On Mon, 8 Mar 2010 16:12:35 +0000
Stephen Nelson-Smith <sanelson at gmail.com> wrote:

> Hi,
> 
> I've written this today:
> 
> #!/usr/bin/env python
> import re
> 
> pattern = r'(?P<ForwardedFor>^(-|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(,
> [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})*){1})
> (?P<RemoteLogname>(\S*)) (?P<RemoteUser>(\S*))
> (?P<Timestamp>(\[[^\]]+\]))
> (?P<FirstLineOfRequest>(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)
> (?P<Status>(\S*)) (?P<Size>(\S*))
> (?P<Referrer>(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)
> (?P<UserAgent>(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)(
> )?(?P<SiteIntelligenceCookie>(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)'
[...]
> It works fine, but it looks pretty unreadable and unmaintainable to
> anyone who hasn't spent all day writing regular expressions.

;-)

> I remember reading about verbose regular expressions.  Would these help?

Yes, read the doc and tutorials about python regexes, you'll find some useful features.

> How could I make the above more maintainable?

Maybe you can be interested in the following (seems to me it definitely fits your case): I wrote a kind of "SuperPattern" ;-) extension that allows writing, testing and composing sub-regexes.
I used {name} since this is illegal in a normal regex. Intended use was something like:

import re ; Pattern = re.compile
person = Pattern(r"""...""")
email = Pattern(r"""...""")
phone = Pattern(r"""...""")
<when every subpattern works fine:>
entry = SuperPattern(scope, r"""{person}:\s+{email}\s*--\s*{phone}""")
<use entry>

Don't remember the details, but it was a very simple type to write (meta-using re, indeed). The only point is that the constructor must be passed the scope (typically locals(), else use globals() by default) where to find the sub-patterns. Each regex pattern in fact has a .pattern attr (not sure of the name) that actually holds its format string: so composing a superpattern is just replacing a subpattern's name by its format.

> S.
> 

Denis
-- 
________________________________

la vita e estrany

spir.wikidot.com



More information about the Tutor mailing list