Need a Regular expression to remove a char for Unicode text

Leo Kislov Leo.Kislov at gmail.com
Fri Oct 13 07:55:03 EDT 2006



On Oct 13, 4:44 am, harvey.tho... at informa.com wrote:
> శ్రీనివాస wrote:
> > Hai friends,
> > Can any one tell me how can i remove a character from a unocode text.
> > కల్‌&హార is a Telugu word in Unicode. Here i want to
> > remove '&' but not replace with a zero width char. And one more thing,
> > if any whitespaces are there before and after '&' char, the text should
> > be kept as it is. Please tell me how can i workout this with regular
> > expressions.
>
> > Thanks and regards
> > Srinivasa Raju DatlaDon't know anything about Telugu, but is this the approach you want?
>
> >>> x=u'\xfe\xff & \xfe\xff \xfe\xff&\xfe\xff'
> >>> noampre = re.compile('(?<!\s)&(?!\s)', re.UNICODE).sub
> >>> noampre('', x)

He wants to replace & with zero width joiner so the last call should be
noampre(u"\u200D", x)




More information about the Python-list mailing list