problem negative lookahead assertion

Mon Apr 15 16:02:19 EDT 2002

I am currently working on indexing an Old Frisian etymological
dictionary typeset in LaTeX2e. I use a Python script to insert the
index (\index{}) marker for 45 languages. My problem is the following.
I have marked up all the words which have a fixed structure - language
abbreviation (MDu.) \emph{middle dutch  word}. 

mdu = re.compile(r'MDu\.\s\\emph\{(.*?)\}([\s,\W])')
data = mdu.sub('MDu. \emph{\\1}\\2 \index{mdu~\\1}', data)

Now sometimes these words have more that one variant such as shown
below in the sample text excerpt or are mentioned in running text. I
tried to round up these forms with the following regular expression.

emph = re.compile(r'\s.\\emph\{(.*?)\}([\s,\W])(?!\\index)')
data = emph.sub('\emph{\\1}\\2 \index{unl~\\1}', data)

When this was applied to the text below all the \emph{} phrases are
replaced instead of only those not followed by the \index{} commando.

<sample text>

\begin{germdata} ON \emph{va\th a} \index{on~va\th a}`wade, rush, walk
through', OE \emph{wadan}, \index{oe~wadan} OHG, MHG \emph{watan}
\index{mhg~watan}`wade, stride', MLG \emph{w\=aden},
\index{mlg~w\=aden} MDu. \emph{waden}, \index{mdu~waden} \emph{waeyen}
`wade, go' \end{germdata}

</sample text>

I have tried all the variants I could think of on the emph regular
expression but they all had the same outcome. I also used the
redemo.py script that comes with the python distribution to identify
the problem.

Could somebody help me out.

regards sjoerd 

PS I am using python 2.2.1 on a suse linux 7.3 box.