[I18n-sig] XML and UTF-16
Paul Prescod
paulp@ActiveState.com
Thu, 31 May 2001 14:34:30 -0700
Tom Emerson wrote:
>
> Paul Prescod writes:
> > Tom Emerson wrote:
> > > Yes. You can then pretty easily autodetect the which Unicode
> > > transformation format is being used by looking at the first ten or
> > > so bytes.
> >
> > Actually, the first four bytes are sufficient to get you started. Then
> > you have to look at the encoding declaration if present.
>
> Even for UTF-32?
I think so. UTF-32 is a 32-bit encoding and 32 bits are 4 bytes. You
only need one character (either a BOM or a "<") sign to know what you
are dealing with.
You were right that it is an appendix to the spec:
http://www.w3.org/TR/REC-xml.html#sec-guessing
--
Take a recipe. Leave a recipe.
Python Cookbook! http://www.ActiveState.com/pythoncookbook