[Python-Dev] Unicode strings as filenames

Martin v. Loewis martin@v.loewis.de
Sun, 6 Jan 2002 01:20:27 +0100


> Hmm, shouldn't StringObjects themselves carry an encoding field
> (defaulting to sys.encoding)? 

That approach has been discussed during the design phase of the
Unicode API; Bill Janssen was the first to propose this in response
to my talk

http://www.python.org/workshops/1997-10/proceedings/loewis.html

During the Unicode design, this idea came up sometimes, but it always
turned out that proposers could not give a coherent semantics to such
tags. Just explain what happens if you add two strings that have
different encodings.

> That would solve quite a fewb issues.

And introduce many new ones.

> > Making UTF-8 the default Python system encoding would have many other 
> > consequences -- and you'd probably lose a great deal of portability 
> > since UTF-8 conversion (nearly) always will succeed while ASCII can 
> > easily fail on other systems which use e.g. Latin-1 as native 
> > encoding.
> 
> What are your reasons for asserting this? 

If I understand this claim correctly, he means:

"Currently, if auto-conversion (to ASCII) succeeds, the result is
 likely correc. If the default encoding was UTF-8, conversion would
 succeed for all Unicode objects, but give incorrect results for many
 users, e.g. if they use Latin-1 on their terminal"

This is actually a frequent problem since the introduction of UTF-8:
Some applications display the bytes that make up an UTF-8 string as if
it was a Latin-1 string, rendering it completely unreadable (although
I can already recognize my name if I run into such an application).

This problem may go unnoticed during testing, whereas an exception
is likely noticed.

> If I read this correctly this would make Python compatible to the
> least common denominator of all platforms, while I think I would
> prefer it to allow access to all the niceties a platform gives.

It does no such thing. The application has full control over all
conversions, if it initiates them explicitly. Explicit is better then
implicit.

Regards,
Martin