Need a Regular expression to remove a char for Unicode text

Sybren Stuvel sybrenUSE at YOURthirdtower.com.imagination
Fri Oct 13 07:41:18 EDT 2006


శ్రీనివాస enlightened us with:
> Can any one tell me how can i remove a character from a unocode
> text.  కల్<200c>&హార is a Telugu word in Unicode. Here i want to
> remove '&' but not replace with a zero width char. And one more
> thing, if any whitespaces are there before and after '&' char, the
> text should be kept as it is.

So basically, you want to match <200c>& and replace it with <200c>,
but only if it's not surrounded by whitespace, right?

r"(?<!\s)\x200c&(?!\s)" should match. I'm sure you'll be able to take
it from there.

Sybren
-- 
Sybren Stüvel
Stüvel IT - http://www.stuvel.eu/



More information about the Python-list mailing list