[XML-SIG] Yet another stupid XML question

Fredrik Lundh fredrik@pythonware.com
Fri, 8 May 1998 12:18:42 +0200


>> Which reminds me of one thing: when I first read the XML specification,
>> I came under the impression that you can determine whether a document
>> uses 8/16/32-bit characters by looking at the first bytes. 
>
>Sort of. For entities not in UTF-8 or -16 you can do this. Distinguishing
>between UTF-8 and -16 should also be simple. (Appendix F of the spec
>explains this.)

So to rephrase my question: based on the first few bytes, you
should be able to tell if the file contains 8-bit, 16-bit or 32-bit
characters?

>> But I've recently seen a few references that seem to claim that you 
>> can also change character sets for each new element. 
>
>That's wrong, but maybe you/they think of/mean entities?

Nope. They mentioned 'elements'.  Looks like they were wrong (the
amount of hype surrounding XML is starting to eclipse that of Java;
I've even seen people talking about writing programs in XML ;-)

>When
>
>   &external_entity;
>
>refers to an external entity there's no constraint that the external
>entity be in the same character set as the referring entity, which is
>why external entities can have their own XML declaration (the spec
>calls it a text declaration). 

Sounds reasonable.

Thanks /F