problem negative lookahead assertion
A.M. Kuchling
akuchlin at ute.mems-exchange.org
Mon Apr 15 17:04:53 EDT 2002
In article <4b926dbb.0204151202.6d709633 at posting.google.com>,
sjoerd siebinga wrote:
> emph = re.compile(r'\s.\\emph\{(.*?)\}([\s,\W])(?!\\index)')
I had to change \s. to just \s to get the pattern to match at all.
> \begin{germdata} ON \emph{va\th a} \index{on~va\th a}`wade, rush, walk
The pattern is matching too far. This text was turned into the following:
\begin{germdata} ON\emph{va\th a} \index{on~va\th a}` \index{unl~va\th a}
^^^^^^^^^^^^^^^^^^^^^^^^^^ contents of
group 1.
I can make it work the way you intend by writing ([^}]*?) instead of
(.*?); this ensures that group 1 in the regex can only match up to the
very next '}'. Note that this will break in a different situation, if
you have {} inside the contents of \emph, as for example in
\emph{ab^{cd}}, because this problem really needs a full parser to
handle the general case. But maybe your input data never uses {}
inside \emph{}, in which case you'd be OK.
--amk (www.amk.ca)
The more I know me, the less I like me.
-- The Doctor, in "Time and the Rani"
More information about the Python-list
mailing list