RegEx conditional search and replace

Martin Evans martin at browns-nospam.co.uk
Wed Jul 5 07:34:28 EDT 2006


Sorry, yet another REGEX question.  I've been struggling with trying to get 
a regular expression to do the following example in Python:

Search and replace all instances of "sleeping" with "dead".

This parrot is sleeping. Really, it is sleeping.
to
This parrot is dead. Really, it is dead.


But not if part of a link or inside a link:

This parrot <a href="sleeping.htm" target="new">is sleeping</a>. Really, it 
is sleeping.
to
This parrot <a href="sleeping.htm" target="new">is sleeping</a>. Really, it 
is dead.


This is the full extent of the "html" that would be seen in the text, the 
rest of the page has already been processed. Luckily I can rely on the 
formating always being consistent with the above example (the url will 
normally by much longer in reality though). There may though be more than 
one link present.

I'm hoping to use this to implement the automatic addition of links to other 
areas of a website based on keywords found in the text.

I'm guessing this is a bit too much to ask for regex. If this is the case, 
I'll add some more manual Python parsing to the string, but was hoping to 
use it to learn more about regex.

Any pointers would be appreciated.

Martin






More information about the Python-list mailing list