Regular expressions
Christian Gollwitzer
auriocus at gmx.de
Fri Nov 6 15:36:26 EST 2015
Am 06.11.15 um 20:52 schrieb rurpy at yahoo.com:
> I have always thought lexing
> and parsing solutions for Python were a weak spot in the Python eco-
> system and I was about to write that I would love to see a PEG parser
> for python when I saw this:
>
> http://fdik.org/pyPEG/
>
> Unfortunately it suffers from the same problem that Pyparsing, Ply
> and the rest suffer from: they use Python syntax to express the
> parsing rules rather than using a dedicated problem-specific syntax
> such as you used to illustrate peg parsing:
>
>> pattern <- phone_number name phone_number
>> phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
>> name <- [[:alpha:]]+
That is actually real syntax of a parser generator used by me for
another language (Tcl). A calculator example using this package can be
found here: http://wiki.tcl.tk/39011
(actually it is a retargetable compiler in a few lines - very impressive)
And exactly as you say, it is working well exactly because it doesn't
try to abuse function composition in the frontend to construct the parser.
Looking through the parser generators listed at
http://bford.info/packrat/ it seems that waxeye could be interesting
http://waxeye.org/manual.html#_using_waxeye - however I'm not sure the
Python backend works with Python 3, maybe there will be unicode issues.
Another bonus would be a compilable backend, like Cython or similar. The
pt package mentioned above allows to generate a C module with an
interface for Tcl. Compiled parsers are approximately 100x faster. I
would expect a similar speedup for Python parsers.
> Some here have complained about excessive brevity of regexs but I
> much prefer using problem-specific syntax like "(a*)" to having to
> express a pattern using python with something like
>
> star = RegexMatchAny()
> a_group = RegexGroup('a' + star)
> ...
Yeah that is nonsense. Mechanical verbosity never leads to clarity (XML
anyone?)
> I think in many cases those most hostile to regexes are the also
> those who use them (or need to use them) the least. While my use
> of regexes are limited to fairly simple ones they are complicated
> enough that I'm sure it would take orders of magnitude longer
> to get the same effect in python.
That's also my impression. The "two problems quote" was lame already for
the first time. If you are satisfied with simple string functions, then
either you do not have problems where you need regexps/other formal
parsing tools, or you are very masochistic.
Christian
More information about the Python-list
mailing list