Making regex suck less
Gerson Kurz
gerson.kurz at t-online.de
Sun Sep 1 14:31:29 EDT 2002
I wrote a small wrapper module to "struct" today that will allow me to
write C declarations and have
...
print p.declare("""
struct test1
{
int a,b;
float this,and,that;
};
struct test2
{
int count;
struct test1 data[80]
};
""")
...
and have it create instances I can easily assign data to
...
t2 = p.createInstance("test2")
t2.data[0].a = 42
...
and pack for C extensions by utilizing the struct module
...
data = p.pack(t2)
...
because of being fed up with that pack("qh34s>id",...) stuff. (I'll
post that to my website once its in a state I can let somebody else
see ;)
Anyway, that got me thinking on why do we have to deal with regular
expressions like r"((?:a|b)*)", when in most cases the code will look
something like this:
r = re.compile("<some cryptic re-string here>")
...
r.match(this) or r.find(that)
which means the real time is not spent in the compile() function, but
in the match or find function. So basically, couldn't one come up with
a *human readable* syntax for re, and compile that instead? Python
prides itself on its clean syntax, and human readability, an bang -
import re, get perl-ish code instantly!
Also, I think it would already be an improvement if the syntax
provided for clear and easy-to-understand special cases, like
re.compile("anything that starts with 'abc'")
and if you cannot find something in the special cases for you, you can
always go back to
re.compile("<some cryptinc re-string here>")
After all, *everyone* starting with re thinks the syntax is cryptic
and mind-boggling, and only if you get yourself into the "re mindset",
you understand things like r"\s*\w+\s*=\s*['\"].*?['\"]" instantly. If
we had an easier syntax, more people would be using re ;)
Is the idea utterly foolish?
More information about the Python-list
mailing list