Making regex suck less

John Roth johnroth at ameritech.net
Mon Sep 2 17:49:23 EDT 2002


"Gerson Kurz" <gerson.kurz at t-online.de> wrote in message
news:3d725881.345921 at news.t-online.de...
> I wrote a small wrapper module to "struct" today that will allow me to
> write C declarations and have
>
> ...
> print p.declare("""
>
> struct test1
> {
>    int a,b;
>    float this,and,that;
> };
>
> struct test2
> {
>     int count;
>     struct test1 data[80]
> };
>
> """)
> ...
> and have it create instances I can easily assign data to
> ...
> t2 = p.createInstance("test2")
> t2.data[0].a = 42
> ...
> and pack for C extensions by utilizing the struct module
> ...
> data = p.pack(t2)
> ...
>
> because of being fed up with that pack("qh34s>id",...) stuff. (I'll
> post that to my website once its in a state I can let somebody else
> see ;)
>
> Anyway, that got me thinking on why do we have to deal with regular
> expressions like r"((?:a|b)*)", when in most cases the code will look
> something like this:
>
> r = re.compile("<some cryptic re-string here>")
> ...
> r.match(this) or r.find(that)
>
> which means the real time is not spent in the compile() function, but
> in the match or find function. So basically, couldn't one come up with
> a *human readable* syntax for re, and compile that instead? Python
> prides itself on its clean syntax, and human readability, an bang -
> import re, get perl-ish code instantly!
>
> Also, I think it would already be an improvement if the syntax
> provided for clear and easy-to-understand special cases, like
>
> re.compile("anything that starts with 'abc'")
>
> and if you cannot find something in the special cases for you, you can
> always go back to
>
> re.compile("<some cryptinc re-string here>")
>
> After all, *everyone* starting with re thinks the syntax is cryptic
> and mind-boggling, and only if you get yourself into the "re mindset",
> you understand things like r"\s*\w+\s*=\s*['\"].*?['\"]" instantly. If
> we had an easier syntax, more people would be using re ;)
>
> Is the idea utterly foolish?

No, it's not utterly foolish. You might be surprised to learn that
Larry Wall agrees with you that the Perl regex syntax is much
too obtuse, and in need of a basic, ground up redesign. Even
current Perl syntax allows you a special form where you can
insert blanks for readability.

http://www.perl.com/pub/a/2002/06/04/apo5.html

http://www.perl.com/pub/a/2002/08/22/exegesis5.html

It's an interesting redesign of basic regex functionality.
Some of the things you can do with it are very, very
interesting indeed.

John Roth
>





More information about the Python-list mailing list