decode Numeric Character References to unicode

Duncan Booth duncan.booth at invalid.invalid
Mon Feb 18 07:09:55 EST 2008


7stud <bbxx789_05ss at yahoo.com> wrote:

> On Feb 18, 4:53 am, 7stud <bbxx789_0... at yahoo.com> wrote:
>> On Feb 18, 3:20 am, William Heymann <k... at aesaeion.com> wrote:
>>
>> > How do I decode a string back to useful unicode that has xml
>> > numeric cha 
> racter
>> > references in it?
>>
>> > Things like 占  #which is: &_#21344_; (without the
>> > underscores) 
>>
>> BeautifulSoup can handle two of the three formats for html entities.
>> For instance, an 'o' with umlaut can be represented in three
>> different ways:
>>
>> &_ouml_;
>> ö
>> ö
>>
> 
> lol.  It's hard to even make posts about this stuff because html
> entities get converted by the forum software. Here are the three
> different formats for an 'o with umlaut' with some underscores added
> to keep the forum software from rendering the characters:
> 
> &_ouml_;
> &_#246_;
> &_#xf6_;

FWIW, your original post was fine, it was just the quoted text in your 
followup that was wrong.

I guess that is yet another reason to use a real newsreader or the mailing 
list rather than Google Groups.



More information about the Python-list mailing list