Making regex suck less
Gerhard Häring
gerhard.haering at gmx.de
Sun Sep 1 15:13:56 EDT 2002
* Gerson Kurz <gerson.kurz at t-online.de> [2002-09-01 18:31 +0000]:
> [...] Anyway, that got me thinking on why do we have to deal with
> regular expressions like r"((?:a|b)*)", when in most cases the code
> will look something like this:
>
> r = re.compile("<some cryptic re-string here>")
> ...
> r.match(this) or r.find(that)
If you only use the RE once, you can use the module-level functions ;-)
> which means the real time is not spent in the compile() function, but
> in the match or find function. So basically, couldn't one come up with
> a *human readable* syntax for re, and compile that instead?
That's equally powerful? Most probably not.
> Also, I think it would already be an improvement if the syntax
> provided for clear and easy-to-understand special cases, like
>
> re.compile("anything that starts with 'abc'")
s.startswith("abc")
s.lower().startswith("abc")
> and if you cannot find something in the special cases for you, you can
> always go back to
>
> re.compile("<some cryptinc re-string here>")
>
> After all, *everyone* starting with re thinks the syntax is cryptic
> and mind-boggling, and only if you get yourself into the "re mindset",
> you understand things like r"\s*\w+\s*=\s*['\"].*?['\"]" instantly. If
> we had an easier syntax, more people would be using re ;)
>
> Is the idea utterly foolish?
I don't really know. IMO if you have very simple string-searching, then
you can probably get away with the string methods, and if you have very
complex stuff, then you'll probably be better of with a parser generator
(like SimpleParse, which is very readable, IMO).
I don't find regular expressions that unreadably, especially when I
consider that I'd have to write many lines of error-prone Python code
instead. Stuff like this is just too convenient:
# working around zxDateTime limitations:
if JYTHON:
import re
ISO_DATE_RE = re.compile(r"(\d\d\d\d)-(\d\d)-(\d\d)")
def DateFrom(s):
match = ISO_DATE_RE.match(s)
if match is None:
raise ValueError
return DateTime(*map(int, match.groups()))
Gerhard
--
mail: gerhard <at> bigfoot <dot> de registered Linux user #64239
web: http://www.cs.fhm.edu/~ifw00065/ OpenPGP public key id AD24C930
public key fingerprint: 3FCC 8700 3012 0A9E B0C9 3667 814B 9CAA AD24 C930
reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))
More information about the Python-list
mailing list