decode Numeric Character References to unicode

7stud bbxx789_05ss at yahoo.com
Mon Feb 18 07:00:17 EST 2008


On Feb 18, 4:53 am, 7stud <bbxx789_0... at yahoo.com> wrote:
> On Feb 18, 3:20 am, William Heymann <k... at aesaeion.com> wrote:
>
> > How do I decode a string back to useful unicode that has xml numeric character
> > references in it?
>
> > Things like 占  #which is: &_#21344_; (without the underscores)
>
> BeautifulSoup can handle two of the three formats for html entities.
> For instance, an 'o' with umlaut can be represented in three different
> ways:
>
> &_ouml_;
> ö
> ö
>

lol.  It's hard to even make posts about this stuff because html
entities get converted by the forum software. Here are the three
different formats for an 'o with umlaut' with some underscores added
to keep the forum software from rendering the characters:

&_ouml_;
&_#246_;
&_#xf6_;



More information about the Python-list mailing list