[I18n-sig] Passing unicode strings to file system calls

Martin v. Loewis martin@v.loewis.de
18 Jul 2002 17:05:02 +0200


"Bleyer, Michael" <MBleyer@DEFiNiENS.com> writes:

> What I want to do, is create file names from a list that has strings in both
> encodings. The strings can be handled fine while in unicode, but as soon as
> I try to convert all of them to one encoding, half of the conversions will
> fail. I just want to convert them with the proper encoding and then pass the
> bytestring to the system function, I don't worry about wether it will
> _display_ right, just about wether the name is correct. 

For that, you need to give a definition of "correct". From your
description, I'd say that encoding the strings as "utf-8" is also
"correct" - it gives you byte strings that identify the original file
names.

> What I would like to have is some function that will tell me for a given
> Unicode string, a list of all the encodings that this string can be
> converted into (without having to try all available encodings in a brute
> force loop), because I do not know the proper encoding a priori.

I doubt that you can implement such function without a "brute force"
algorithm of some kind.

> Anyway, if there isn't a direct interface/solution, what would you
> consider the best workaround for Python?

Use brute force.

Perhaps I'm still not understanding your problem clearly. To
understand it better, can you please answer the following questions?

- does your problem really have to do with file names? Or can it be
  considered as independent of the problem of file names?

- would it help if, for each Unicode character, there was a list of
  encodings that can represent that character?

Regards,
Martin