[Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons)

M.-A. Lemburg mal@lemburg.com
Fri, 07 Apr 2000 12:55:30 +0200


Fredrik Lundh wrote:
> 
> M.-A. Lemburg wrote:
> > The UTF-8 assumption had to be made in order to get the two
> > worlds to interoperate. We could have just as well chosen
> > Latin-1, but then people currently using say a Russian
> > encoding would get upset for the same reason.
> >
> > One way or another somebody is not going to like whatever
> > we choose, I'm afraid... the simplest solution is to use
> > Unicode for all strings which contain non-ASCII characters
> > and then call .encode() as necessary.
> 
> just a brief head's up:
> 
> I've been playing with this a bit, and my current view is that
> the current unicode design is horridly broken when it comes
> to mixing 8-bit and 16-bit strings. 

Why "horribly" ? String and Unicode mix pretty well, IMHO.

The magic auto-conversion of Unicode to UTF-8 in C APIs
using "s" or "s#" does not always do what the user expects,
but it's still better than not having Unicode objects work
with these APIs at all.

> basically, if you pass a uni-
> code string to a function slicing and dicing 8-bit strings, it
> will probably not work.  and you will probably not under-
> stand why.
> 
> I'm working on a proposal that I think will make things simpler
> and less magic, and far easier to understand.  to appear on
> sunday.

Looking forward to it,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/