[I18n-sig] Re: Unicode debate
Fredrik Lundh
Fredrik Lundh" <effbot@telia.com
Sat, 29 Apr 2000 16:52:30 +0200
Paul Prescod wrote:
> > > I think that maybe an important point is getting lost here. I =
could be
> > > wrong, but it seems that all of this emphasis on encodings is =
misplaced.
> >=20
> > In practical applications that manipulate text, encodings creep up =
all
> > the time. =20
>=20
> I'm not saying that encodings are unimportant. I'm saying that that =
they
> are *different* than what Fredrik was talking about. He was talking
> about a coherent logical model for characters and character strings
> based on the conventions of more modern languages and systems than
> C and Python.
note that the existing Python language reference describes this
model very clearly:
[Sequences] represent finite ordered sets indexed
by natural numbers.
The built-in function len() returns the number of
items of a sequence.
When the length of a sequence is n, the index set
contains the numbers 0, 1, ..., n-1.
Item i of sequence a is selected by a[i].
An object of an immutable sequence type cannot
change once it is created.
The items of a string are characters.
There is no separate character type; a character is
represented by a string of one item.
Characters represent (at least) 8-bit bytes.
The built-in functions chr() and ord() convert between
characters and nonnegative integers representing the
byte values.
Bytes with the values 0-127 usually represent the corre-
sponding ASCII values, but the interpretation of values is
up to the program.
The string data type is also used to represent arrays
of bytes, e.g., to hold data read from a file.=20
as I've pointed out before, I want this to apply to all kinds of
strings in 1.6. imo, the cleanest way to do this is to change
the last three sentences to:
The built-in functions chr() and ord() convert between
characters and nonnegative integers representing the
character codes.
Character codes usually represent the corresponding
unicode characters.
The 8-bit string data type is also used to represent arrays
of bytes, e.g., to hold data read from a file.
the encodings debate has nothing to do with this model.
...
more later. gotta run.
</F>