unicode html

Tue Jul 18 02:43:58 EDT 2006

lorenzo.viscanti at gmail.com wrote:
> Hi, I've found lots of material on the net about unicode html
> conversions, but still i'm having many problems converting unicode
> characters to html entities. Is there any available function to solve
> this issue?
> As an example I would like to do this kind of conversion:
> \uc3B4 => ô
> for all available html entities.

I don't know how you generate your HTML, but ElementTree and lxml both have
good HTML parsers, so that you can let them write out the result with an
"US-ASCII" encoding and they will generate numeric entities for everything
that's not ASCII.

    >>> from lxml import etree
    >>> root = etree.HTML(my_html_data)
    >>> html_7_bit = etree.tostring(root, "us-ascii")

Stefan