[I18n-sig] XML and UTF-16

Paul Prescod paulp@ActiveState.com
Thu, 31 May 2001 14:34:30 -0700


Tom Emerson wrote:
> 
> Paul Prescod writes:
> > Tom Emerson wrote:
> > > Yes. You can then pretty easily autodetect the which Unicode
> > > transformation format is being used by looking at the first ten or
> > > so bytes.
> >
> > Actually, the first four bytes are sufficient to get you started. Then
> > you have to look at the encoding declaration if present.
> 
> Even for UTF-32?

I think so. UTF-32 is a 32-bit encoding and 32 bits are 4 bytes. You
only need one character (either a BOM or a "<") sign to know what you
are dealing with.

You were right that it is an appendix to the spec:

 http://www.w3.org/TR/REC-xml.html#sec-guessing

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook