editing in Unicode
effbot at pythonware.com
effbot at pythonware.com
Fri Sep 8 06:42:31 EDT 2000
marcin wrote:
> Since UTF-16 is not compatible with ASCII, it does not make much
> sense to have just a string encoded in UTF-16 and the rest of code in
> ASCII. If UTF-16 is to be used, it would probably have to be specified
> externally to the source.
Not necessarily: XML solves this by requiring a certain character
sequence first in the file, but only if you insist on using a non-
ASCII compatible encoding. In Python, this sequence could for
example be "#!", and the compiler could figure things out by
looking at the first four bytes:
00 00 00 23: UCS-4, big-endian machine
23 00 00 00: UCS-4, little-endian machine
FE FF -- --: UTF-16, big-endian
FF FE -- --: UTF-16, little-endian
00 23 00 21: UTF-16, big-endian, no Byte Order Mark
23 00 21 00: UTF-16, little-endian, no Byte Order Mark
3C 23 -- --: UTF-8 or other ASCII-compatible encoding
-- -- -- --: same, hopefully
(check the encoding pragma for details; default is
"unknown" as in 2.0. also see below)
> IMHO there should be a way of specifying the encoding of the source
> in the source
Definitely. Hopefully, that will go into 2.1.
Note that in 2.0, the default source encoding is "unknown". With
this encoding, "" string literals stores 8-bit characters as is,
and u"" string literals treats 8-bit characters as ISO 8859-1.
> and they should be only ASCII-compatible encodings.
Maybe, maybe not.
</F>
<!-- daily news from the python universe:
http://www.pythonware.com/daily/index.htm
-->
Sent via Deja.com http://www.deja.com/
Before you buy.
More information about the Python-list
mailing list