[I18n-sig] Passing unicode strings to file system calls

Bleyer, Michael MBleyer@DEFiNiENS.com
Fri, 19 Jul 2002 11:06:13 +0200


> For that, you need to give a definition of "correct". From 
> your description, I'd say that encoding the strings as 
> "utf-8" is also "correct" - it gives you byte strings that 
> identify the original file names.

True. But then UTF8 is always "correct" because any Unicode string can be
converted to UTF8.
I guess most OSs use some other encoding for display though.
 
> - does your problem really have to do with file names? Or can it be
>   considered as independent of the problem of file names?
I guess it's not only file names but any system call.
 
> - would it help if, for each Unicode character, there was a list of
>   encodings that can represent that character?

Yup. If I have:
myUString = u'<someUnicodeCharsHere>'

I'd like a function that returns a list of legal encodings for that string,
e.g.
myLegalEncodingList = locale.getLegalEncodings(myUString)

The list would be something like
['cp1250','latin-1','utf8'] etc.
for example.

A function that only works with a single Unicode character would be good
enough I guess.

Now if I have a unicode string I would try to convert it to the system
default encoding first and if that doesn't work, I would like to give the
user some feedback and maybe some choice (from a list of legal encodings)
over which encoding to use instead.

Does that make sense?

Mike