LANG, locale, unicode, setup.py and Debian packaging

"Martin v. Löwis" martin at v.loewis.de
Sun Jan 13 07:03:40 EST 2008


>> I would advise against such a strategy. Instead, you should first
>> understand what the encodings of the file names actually *are*, on
>> a real system, and draw conclusions from that.
> I don't follow you here. The encoding of file names *on* a real system are 
> (for Linux) byte strings of potentially *any* encoding.

No. On a real system, nothing is potential, but everything is actual.

So on *your* system, today: what encoding are the filenames encoded in?
We are not talking about arbitrary files, right, but about font files?
What *actual* file names do these font files have?

On my system, all font files have ASCII-only file names, even if they
are for non-ASCII characters.

> os.listdir() may even 
> fail to grok some of them. So, I will have a few elements in a list that are 
> not unicode, I can't ask the O/S for any help and therefore I should be able 
> to pass that byte string to a function as suggested in the article to at 
> least take one last stab at identifying it. 

It won't identify it. It will just give you *some* Unicode string.

>  Or is that a waste of time because os.listdir() has already tried something 
> similar (and prob. better)?

"better" is a difficult notion here. Is it better to produce some
result, possibly incorrect, or is it better to give up?

> I forgot to mention the command-line interface... I actually had trouble with 
> that too. The user can start the app like this:
> fontypython /some/folder/
> or
> fontypython SomeFileName
> And that introduces input in some kind of encoding. I hope that 
> locale.getprefferedencoding() will be the right one to handle that.

If the user has set up his machine correctly: yes.

>> I see no problem with that:
>>>>> u"M\xd6gul".encode("ascii","ignore")
>> 'Mgul'
>>>>> u"M\xd6gul".encode("ascii","replace")
>> 'M?gul'
> Well, that was what I expected to see too. I must have been doing something 
> stupid.

Most likely, you did not invoke .encode on a Unicode string.

Regards,
Martin



More information about the Python-list mailing list