[I18n-sig] XML and UTF-16

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 1 Jun 2001 00:12:11 +0200


> Well, you know that the first UTF-32 character is "<", but no
> more. 

According to the procedure specified in the XML recommendation, this
is enough for auto-detection, so you clearly don't need to look at
more bytes when parsing XML.

In any case, what would you do if you find out that the next few bytes
cannot be interpreted as ?xml in UTF-32? You would probably signal an
error. So would you if the document is not well-formed XML if treated
as UTF-32 after looking at the first few bytes.

Regards,
Martin