Html character entity conversion
Duncan Booth
duncan.booth at invalid.invalid
Tue Aug 1 09:06:01 EDT 2006
pak.andrei at gmail.com wrote:
> How can I convert encoded string
>
> sEncodedHtmlText = 'привет
> питон'
>
> into human readable:
>
> sDecodedHtmlText == 'пÑÐ¸Ð²ÐµÑ Ð¿Ð¸Ñон'
How about:
>>> sEncodedHtmlText = 'text:
приветпитоl
5;'
>>> def unescape(m):
return unichr(int(m.group(0)[2:-1]))
>>> print re.sub('&#[0-9]+;', unescape, sEncodedHtmlText)
text: ???????????
I'm afraid my newsreader couldn't cope with either your original text or my
output, but I think this gives the string you wanted. You probably also
ought to decode sEncodedHtmlText to unicode first otherwise anything which
isn't an entity escape will be converted to unicode using the default ascii
encoding.
More information about the Python-list
mailing list