Html character entity conversion
yichun
yichunwe at usc.edu
Sat Sep 9 20:58:47 EDT 2006
pak.andrei at gmail.com wrote:
> danielx wrote:
>> pak.andrei at gmail.com wrote:
>>> Here is my script:
>>>
>>> from mechanize import *
>>> from BeautifulSoup import *
>>> import StringIO
>>> b = Browser()
>>> f = b.open("http://www.translate.ru/text.asp?lang=ru")
>>> b.select_form(nr=0)
>>> b["source"] = "hello python"
>>> html = b.submit().get_data()
>>> soup = BeautifulSoup(html)
>>> print soup.find("span", id = "r_text").string
>>>
>>> OUTPUT:
>>> привет
>>> питон
>>> ----------
>>> In russian it looks like:
>>> "привет питон"
>>>
>>> How can I translate this using standard Python libraries??
>>>
>>> --
>
> Thank you for response.
> It doesn't matter what is 'BeautifulSoup'...
However, the best solution is to ask BeautifulSoup to do that for you.
if you do
soup = BeautifulSoup(your_html_page, convertEntities="html")
you should not be worrying about the problem you had. this converts all
the html entities (the five you see as soup.entitydefs) and all the
"&#xxx;" stuff to their python unicode string.
yichun
> General question is:
>
> How can I convert encoded string
>
> sEncodedHtmlText = 'привет
> питон'
>
> into human readable:
>
> sDecodedHtmlText == 'привет питон'
>
More information about the Python-list
mailing list