problem negative lookahead assertion

A.M. Kuchling akuchlin at ute.mems-exchange.org
Mon Apr 15 17:04:53 EDT 2002


In article <4b926dbb.0204151202.6d709633 at posting.google.com>, 
	sjoerd siebinga wrote:
> emph = re.compile(r'\s.\\emph\{(.*?)\}([\s,\W])(?!\\index)')

I had to change \s. to just \s to get the pattern to match at all.

> \begin{germdata} ON \emph{va\th a} \index{on~va\th a}`wade, rush, walk

The pattern is matching too far. This text was turned into the following:
\begin{germdata} ON\emph{va\th a} \index{on~va\th a}` \index{unl~va\th a}
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^ contents of 
group 1.

I can make it work the way you intend by writing ([^}]*?) instead of
(.*?); this ensures that group 1 in the regex can only match up to the
very next '}'.  Note that this will break in a different situation, if
you have {} inside the contents of \emph, as for example in
\emph{ab^{cd}}, because this problem really needs a full parser to
handle the general case.  But maybe your input data never uses {}
inside \emph{}, in which case you'd be OK.

--amk                                                             (www.amk.ca)
The more I know me, the less I like me.
    -- The Doctor, in "Time and the Rani"



More information about the Python-list mailing list