Html character entity conversion

Sun Jul 30 11:40:49 EDT 2006

pak.andrei at gmail.com wrote:
> Here is my script:
>
> from mechanize import *
> from BeautifulSoup import *
> import StringIO
> b = Browser()
> f = b.open("http://www.translate.ru/text.asp?lang=ru")
> b.select_form(nr=0)
> b["source"] = "hello python"
> html = b.submit().get_data()
> soup = BeautifulSoup(html)
> print  soup.find("span", id = "r_text").string
>
> OUTPUT:
> привет
> питон
> ----------
> In russian it looks like:
> "привет питон"
>
> How can I translate this using standard Python libraries??
>
> --
> Pak Andrei, http://paxoblog.blogspot.com, icq://97449800

I'm having trouble understanding how your script works (what would a
"BeautifulSoup" function do?), but assuming your intent is to find
character reference objects in an html document, you might try using
the HTMLParser class in the HTMLParser module. This class delegates
several methods. One of them is handle_charref. It will be called with
one argument, the name of the reference, which includes only the number
part. HTMLParser is alot more powerful than that though. There may be
something more light-weight out there that will accomplish what you
want. Then again, you might be able to find a use for all that power :P.