Regular expressions

Christian Gollwitzer auriocus at gmx.de
Fri Nov 6 15:36:26 EST 2015


Am 06.11.15 um 20:52 schrieb rurpy at yahoo.com:
> I have always thought lexing
> and parsing solutions for Python were a weak spot in the Python eco-
> system and I was about to write that I would love to see a PEG parser
> for python when I saw this:
>
> http://fdik.org/pyPEG/
>
> Unfortunately it suffers from the same problem that Pyparsing, Ply
> and the rest suffer from: they use Python syntax to express the
> parsing rules rather than using a dedicated problem-specific syntax
> such as you used to illustrate peg parsing:
>
>> pattern <- phone_number name phone_number
 >> phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
 >> name <-  [[:alpha:]]+

That is actually real syntax of a parser generator used by me for 
another language (Tcl). A calculator example using this package can be 
found here: http://wiki.tcl.tk/39011
(actually it is a retargetable compiler in a few lines - very impressive)

And exactly as you say, it is working well exactly because it doesn't 
try to abuse function composition in the frontend to construct the parser.

Looking through the parser generators listed at 
http://bford.info/packrat/ it seems that waxeye could be interesting 
http://waxeye.org/manual.html#_using_waxeye - however I'm not sure the 
Python backend works with Python 3, maybe there will be unicode issues. 
Another bonus would be a compilable backend, like Cython or similar. The 
pt package mentioned above allows to generate a C module with an 
interface for Tcl. Compiled parsers are approximately 100x faster. I 
would expect a similar speedup for Python parsers.

> Some here have complained about excessive brevity of regexs but I
> much prefer using problem-specific syntax like "(a*)" to having to
> express a pattern using python with something like
>
> star = RegexMatchAny()
> a_group = RegexGroup('a' + star)
> ...

Yeah that is nonsense. Mechanical verbosity never leads to clarity (XML 
anyone?)

> I think in many cases those most hostile to regexes are the also
> those who use them (or need to use them) the least. While my use
> of regexes are limited to fairly simple ones they are complicated
> enough that I'm sure it would take orders of magnitude longer
> to get the same effect in python.

That's also my impression. The "two problems quote" was lame already for 
the first time. If you are satisfied with simple string functions, then 
either you do not have problems where you need regexps/other formal 
parsing tools, or you are very masochistic.

	Christian



More information about the Python-list mailing list