Making regex suck less

Gerson Kurz gerson.kurz at t-online.de
Sun Sep 1 14:31:29 EDT 2002


I wrote a small wrapper module to "struct" today that will allow me to
write C declarations and have 

...
print p.declare("""

struct test1
{
   int a,b;
   float this,and,that;
};

struct test2
{
    int count;
    struct test1 data[80]
}; 

""")
...
and have it create instances I can easily assign data to
...
t2 = p.createInstance("test2")
t2.data[0].a = 42
...
and pack for C extensions by utilizing the struct module
...
data = p.pack(t2)
...

because of being fed up with that pack("qh34s>id",...) stuff. (I'll
post that to my website once its in a state I can let somebody else
see ;) 

Anyway, that got me thinking on why do we have to deal with regular
expressions like r"((?:a|b)*)", when in most cases the code will look
something like this:

r = re.compile("<some cryptic re-string here>")
...
r.match(this) or r.find(that)

which means the real time is not spent in the compile() function, but
in the match or find function. So basically, couldn't one come up with
a *human readable* syntax for re, and compile that instead? Python
prides itself on its clean syntax, and human readability, an bang -
import re, get perl-ish code instantly! 

Also, I think it would already be an improvement if the syntax
provided for clear and easy-to-understand special cases, like

re.compile("anything that starts with 'abc'")

and if you cannot find something in the special cases for you, you can
always go back to 

re.compile("<some cryptinc re-string here>")

After all, *everyone* starting with re thinks the syntax is cryptic
and mind-boggling, and only if you get yourself into the "re mindset",
you understand things like r"\s*\w+\s*=\s*['\"].*?['\"]" instantly. If
we had an easier syntax, more people would be using re ;) 

Is the idea utterly foolish? 




More information about the Python-list mailing list