[Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons)

Fredrik Lundh fredrik@pythonware.com
Fri, 7 Apr 2000 11:13:23 +0200


M.-A. Lemburg wrote:
> The UTF-8 assumption had to be made in order to get the two
> worlds to interoperate. We could have just as well chosen
> Latin-1, but then people currently using say a Russian
> encoding would get upset for the same reason.
>=20
> One way or another somebody is not going to like whatever
> we choose, I'm afraid... the simplest solution is to use
> Unicode for all strings which contain non-ASCII characters
> and then call .encode() as necessary.

just a brief head's up:

I've been playing with this a bit, and my current view is that
the current unicode design is horridly broken when it comes
to mixing 8-bit and 16-bit strings.  basically, if you pass a uni-
code string to a function slicing and dicing 8-bit strings, it
will probably not work.  and you will probably not under-
stand why.

I'm working on a proposal that I think will make things simpler
and less magic, and far easier to understand.  to appear on
sunday.

</F>