Python regular expressions just ain't PCRE

Wiseman Wiseman1024 at gmail.com
Sat May 5 12:00:17 EDT 2007


On May 5, 7:19 am, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> In <1178323901.381993.47... at e65g2000hsc.googlegroups.com>, Wiseman wrote:
> > Note: I know there are LALR parser generators/parsers for Python, but
> > the very reason why re exists is to provide a much simpler, more
> > productive way to parse or validate simple languages and process text.
> > (The pyparse/yappy/yapps/<insert your favourite Python parser
> > generator here> argument could have been used to skip regular
> > expression support in the language, or to deprecate re. Would you want
> > that? And following the same rule, why would we have Python when
> > there's C?)
>
> I don't follow your reasoning here.  `re` is useful for matching tokens
> for a higher level parser and C is useful for writing parts that need
> hardware access or "raw speed" where pure Python is too slow.
>
> Regular expressions can become very unreadable compared to Python source
> code or EBNF grammars but modeling the tokens in EBNF or Python objects
> isn't as compact and readable as simple regular expressions.  So both `re`
> and higher level parsers are useful together and don't supersede each
> other.
>
> The same holds for C and Python.  IMHO.
>
> Ciao,
>         Marc 'BlackJack' Rintsch

Sure, they don't supersede each other and they don't need to. My point
was that the more things you can do with regexes (not really regular
expressions anymore), the better -as long as they are powerful enough
for what you need to accomplish and they don't become a giant Perl-
style hack, of course-, because regular expressions are a built-in,
standard feature of Python, and they are much faster to use and write
than Python code or some LALR parser definition, and they are more
generally known and understood. You aren't going to parse a
programming language with a regex, but you can save a lot of time if
you can parse simple, but not so simple languages with them. Regular
expressions offer a productive alternative to full-fledged parsers for
the cases where you don't need them. So saying if you want feature X
or feature Y in regular expressions you should use a Bison-like parser
sounds a bit like an excuse, because the very reason why regular
expressions like these exist is to avoid using big, complex parsers
for simple cases. As an analogy, I mentioned Python vs. C: you want to
develop high-level languages because they are simpler and more
productive than working with C, even if you can do anything with the
later.




More information about the Python-list mailing list