unicode html

Duncan Booth duncan.booth at invalid.invalid
Tue Jul 18 11:21:50 EDT 2006


Sybren Stuvel wrote:

> Duncan Booth enlightened us with:
>> Don't bother using named entities. If you encode your unicode as
>> ascii  replacing all non-ascii characters with the xml entity
>> reference then your pages will display fine whatever encoding is
>> specified in the HTTP headers.
> 
> Which means OP can't use Unicode/UTF-8 entity references, since that's
> not specified in the HTTP header.
> 
That doesn't matter, character references are not affected by the network 
encoding.

>From http://www.w3.org/TR/html4/charset.html#h-5.3.1

> 5.3.1 Numeric character references
> 
> Numeric character references specify the code position of a character
> in the document character set. 

The character references use the *document character set*, which is 
independant of the character encoding used for network transmission. This 
is defined for HTML as ISO10646, and (section 5.1) "The character set 
defined in [ISO10646] is character-by-character equivalent to Unicode 
([UNICODE])".



More information about the Python-list mailing list