[I18n-sig] Re: [Python-Dev] Unicode debate

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Thu, 4 May 2000 10:07:45 +0200


Sin Hang Kin <kentsin@poboxes.com> wrote:
> > I don't see it as an axiom, but rather as a design decision you make =
to
> > keep your language simple. Along the lines of "all values are =
objects"
> > and (now) all integer values are representable with a single type. =
Are
> > you happy with this?
>=20
> No. A character is not just a character.
>=20
> Got to google and make a search, the return result might be an example =
of
> mixed encoding text:
>=20
> Search engines index pages in their natural encoding, and present the =
result
> as is, so the search result page will contain whatever encoding mixed =
in. If
> you see JIS, ISO 8859, Hebrew, Thai, Utf-8, Big-5, GB2312, EUC, =
Shift-JIS
> you would not be very surprise. So, if you argue that a character is a
> character is a character, how would you handle such a mixed encoding =
text
> mess?

by converting the encoded data, character by character, into a
single known encoding, and doing the search in there?

> No one can write an automatically convertion program for such text, =
only if
> you can treated it as 8-bit bytes you can make use of it. Otherwise =
this is
> a mess.

do you really think the google engine repeats your search in every
possible encoding?  doesn't really sound like the most efficient way
to implement a search engine...

(if you still think that encodings has anything to do with the =
"characters
are characters" rule, see http://www.w3.org/TR/charmod )

</F>