[XML-SIG] Handling of character entity references

Dieter Maurer dieter@handshake.de
Sun, 25 May 2003 20:30:39 +0200


pyxml@wonderclown.com wrote at 2003-5-25 09:47 -0500:
 > ....
 > I do not have a complete DTD for my custom markup, as I don't
 > particularly care to validate it. However, the parser seems unwilling
 > to leave entities alone, so I have tried adding the following to my
 > source document:
 > 
 > <!DOCTYPE gallery [
 >     <!ENTITY % HTMLlat1 PUBLIC
 >        "-//W3C//ENTITIES Latin 1 for XHTML//EN"
 >        "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
 >     %HTMLlat1;
 > ]>
 > 
 > This brings in the XHTML Latin-1 entities, which seems to work well
 > enough to get the parser to accept the source, but then &eacute; gets
 > translated to the following two-byte sequence on output: 0xC3
 > 0xA9.

Character entities are a thing of the past, in general no longer
needed with Unicode.

XML using Unicode tried get rid of the no longer necessary complexities of 
character entities.

Your parser converted the entities into Unicode characters.
On output, they have apparently been converted to UTF-8 (the
XML default encoding).


Dieter