Html character entity conversion

Claudio Grondi claudio.grondi at freenet.de
Sun Jul 30 11:30:46 EDT 2006


pak.andrei at gmail.com wrote:
> Here is my script:
> 
> from mechanize import *
> from BeautifulSoup import *
> import StringIO
> b = Browser()
> f = b.open("http://www.translate.ru/text.asp?lang=ru")
> b.select_form(nr=0)
> b["source"] = "hello python"
> html = b.submit().get_data()
> soup = BeautifulSoup(html)
> print  soup.find("span", id = "r_text").string
> 
> OUTPUT:
> привет
> питон
> ----------
> In russian it looks like:
> "привет питон"
> 
> How can I translate this using standard Python libraries??
> 
> --
> Pak Andrei, http://paxoblog.blogspot.com, icq://97449800
> 
Translate to what and with what purpose?

Assuming your intention is to get a Python Unicode string, what about:

strHTML = 'привет 
питон'
strUnicodeHexCode = strHTML.replace('&#','\u').replace(';','')
strUnicode = eval("u'%s'"%strUnicodeHexCode)

?

I am sure, there is a more elegant and direct solution, but just wanted 
to provide here some quick response.

Claudio Grondi



More information about the Python-list mailing list