[Tutor] regular expression

Rich Krauter rmkrauter at yahoo.com
Thu Feb 12 22:27:15 EST 2004


On Thu, 2004-02-12 at 20:50, Conrad Koziol wrote:
> What is the fastest way to search for a string and then surround it
> <code> and </code> with something. Like so:
> 
> x = '<div> are not allowed, these arent either <br>'
> <some code here>
> x = '<code><div></code> are not allowed, these arent either
> <code><br></code>'
> 
> The two ways this can be done is by subsituting the string like <div>
> with <code><div></code> or inserting <code> and </code> before and after
> it. Which one would be faster and how would I do it? I got as far as
> creating the regular expression r'<[^<>]*>'
> 
> Thanks!!
> 
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

Hi Conrad,

This seems to work. Don't know about speed, and its *not* thoroughly
tested:

import re

x = '<div> are not allowed, these arent either <br>'

pat = r'(?<!<code>)((?!<.*code>)<[^<>]*>)(?!</code>)'
rep = r'<code>\1</code>'

(a,na) =  re.subn(pat,rep,x)
print a

# Next line is ok since I used
# negative lookaheads and negative lookbehinds.
# Without them, you'd get stuff like
# <code><code></code><br></code><code></code>
# if you run subn multiple times

(b,nb) = re.subn(pat,rep,a)
print b

Hope that helps. FYI, I referred to 'Text Processing In Python' by David
Mertz to try to figure this out.

Rich



More information about the Tutor mailing list