Problems with re
Berthold Höllmann
bhoel at server.python.net
Sat May 22 07:51:41 EDT 1999
Tim Peters wrote:
>
> [Berthold Höllmann]
> > I have a regular expression wich, im most cases does what I want it to
> > do. But at least on one string it get's into an endless loop (OK I din't
> > wait forever). See the attaced example:
> >
> > Python 1.5.2 (#2, Apr 22 1999, 14:34:42) [GCC egcs-2.91.66 19990314
> > (egcs-1.1.2 on linux2
> > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> > >>> import re
> > >>> RE = re.escape
> > >>> CP = r'\CallPython'
> > >>> loopR = re.compile(
> > ... "(?:" + RE(CP) + r'(\[.*\])?{(?P<CodeC>(?:' + '""".*?"""|".*?"'
> > + "|'''.*?'''|'.*?'|" +
> > ... '{.*?}+?|[^{]+?)+?))}',
> > ... re.MULTILINE|re.DOTALL)
> > >>>
> > >>> LL = loopR.match("\CallPython{LaTeXPy.PyLaTeX(dir(math))}")
> > >>> LL = loopR.match("\CallPython{LaTeXPy.PyLaTeX({1:1,2:2,3:3}")
> > >>> LL = loopR.match("\CallPython{LaTeXPy.PyLaTeX(dir(math));
> > LaTeXPy.PyLaTeX({1:1,2:2,3:3}")
> >
> > I try to parse a LaTeX file for the included "\CallPython" statements to
> > extract python commands from this statement.
> >
> > Do you have any idea?
>
> Oh, several -- but you're not going to like them <wink>.
>
> + Regular expressions aren't powerful enough to match nested brackets. So a
> regexp approach to this problem is at best a sometimes-screws-up hack, no
> matter how much more time you pour into it.
What would be your recommendation instead?
>
> + If you have to use regexps, at least use re.VERBOSE to make the mess more
> readable; e.g.,
>
> loopR = re.compile(r"""
> (?: \\CallPython
> (\[.*\])?
> {
> (?P<CodeC>
> (?: \""".*?\"""
> | ".*?"
> | '''.*?'''
> | '.*?'
> | {.*?}+?
> | [^{]+?
> )+?
> )
> )
> }
> """, re.MULTILINE | re.DOTALL | re.VERBOSE)
>
> This makes modification enormously easier, and makes some obscurities
> obvious; e.g., by inspection, the outermost (?: ... ) serves no purpose so
> can be removed.
>
It can be removed here, but my original code is a bit more complicated,
it checks for some more. I stripped if for the posting.
...
--
bhoel at starship.python.net
It is unlawful to use this email address for unsolicited ads
(USC Title 47 Sec.227). I will assess a US$500 charge for
reviewing and deleting each unsolicited ad.
More information about the Python-list
mailing list