LANG, locale, unicode, setup.py and Debian packaging
"Martin v. Löwis"
martin at v.loewis.de
Sun Jan 13 07:03:40 EST 2008
>> I would advise against such a strategy. Instead, you should first
>> understand what the encodings of the file names actually *are*, on
>> a real system, and draw conclusions from that.
> I don't follow you here. The encoding of file names *on* a real system are
> (for Linux) byte strings of potentially *any* encoding.
No. On a real system, nothing is potential, but everything is actual.
So on *your* system, today: what encoding are the filenames encoded in?
We are not talking about arbitrary files, right, but about font files?
What *actual* file names do these font files have?
On my system, all font files have ASCII-only file names, even if they
are for non-ASCII characters.
> os.listdir() may even
> fail to grok some of them. So, I will have a few elements in a list that are
> not unicode, I can't ask the O/S for any help and therefore I should be able
> to pass that byte string to a function as suggested in the article to at
> least take one last stab at identifying it.
It won't identify it. It will just give you *some* Unicode string.
> Or is that a waste of time because os.listdir() has already tried something
> similar (and prob. better)?
"better" is a difficult notion here. Is it better to produce some
result, possibly incorrect, or is it better to give up?
> I forgot to mention the command-line interface... I actually had trouble with
> that too. The user can start the app like this:
> fontypython /some/folder/
> or
> fontypython SomeFileName
> And that introduces input in some kind of encoding. I hope that
> locale.getprefferedencoding() will be the right one to handle that.
If the user has set up his machine correctly: yes.
>> I see no problem with that:
>>>>> u"M\xd6gul".encode("ascii","ignore")
>> 'Mgul'
>>>>> u"M\xd6gul".encode("ascii","replace")
>> 'M?gul'
> Well, that was what I expected to see too. I must have been doing something
> stupid.
Most likely, you did not invoke .encode on a Unicode string.
Regards,
Martin
More information about the Python-list
mailing list