Problems with re

Berthold Höllmann bhoel at server.python.net
Sat May 22 07:51:41 EDT 1999


Tim Peters wrote:
> 
> [Berthold Höllmann]
> > I have a regular expression wich, im most cases does what I want it to
> > do. But at least on one string it get's into an endless loop (OK I din't
> > wait forever). See the attaced example:
> >
> > Python 1.5.2 (#2, Apr 22 1999, 14:34:42)  [GCC egcs-2.91.66 19990314
> > (egcs-1.1.2  on linux2
> > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> > >>> import re
> > >>> RE = re.escape
> > >>> CP = r'\CallPython'
> > >>> loopR = re.compile(
> > ...     "(?:" + RE(CP) + r'(\[.*\])?{(?P<CodeC>(?:' + '""".*?"""|".*?"'
> > + "|'''.*?'''|'.*?'|" +
> > ...     '{.*?}+?|[^{]+?)+?))}',
> > ...     re.MULTILINE|re.DOTALL)
> > >>>
> > >>> LL = loopR.match("\CallPython{LaTeXPy.PyLaTeX(dir(math))}")
> > >>> LL = loopR.match("\CallPython{LaTeXPy.PyLaTeX({1:1,2:2,3:3}")
> > >>> LL = loopR.match("\CallPython{LaTeXPy.PyLaTeX(dir(math));
> > LaTeXPy.PyLaTeX({1:1,2:2,3:3}")
> >
> > I try to parse a LaTeX file for the included "\CallPython" statements to
> > extract python commands from this statement.
> >
> > Do you have any idea?
> 
> Oh, several -- but you're not going to like them <wink>.
> 
> + Regular expressions aren't powerful enough to match nested brackets.  So a
> regexp approach to this problem is at best a sometimes-screws-up hack, no
> matter how much more time you pour into it.

What would be your recommendation instead?

> 
> + If you have to use regexps, at least use re.VERBOSE to make the mess more
> readable; e.g.,
> 
> loopR = re.compile(r"""
> (?: \\CallPython
>     (\[.*\])?
>     {
>     (?P<CodeC>
>         (?: \""".*?\"""
>         |   ".*?"
>         |   '''.*?'''
>         |   '.*?'
>         |   {.*?}+?
>         |   [^{]+?
>         )+?
>     )
> )
> }
> """, re.MULTILINE | re.DOTALL | re.VERBOSE)
> 
> This makes modification enormously easier, and makes some obscurities
> obvious; e.g., by inspection, the outermost (?: ... ) serves no purpose so
> can be removed.
> 

It can be removed here, but my original code is a bit more complicated,
it checks for some more. I stripped if for the posting.

...



-- 
bhoel at starship.python.net
        It is unlawful to use this email address for unsolicited ads
        (USC Title 47 Sec.227). I will assess a US$500 charge for
        reviewing and deleting each unsolicited ad.




More information about the Python-list mailing list