LANG, locale, unicode, setup.py and Debian packaging

Donn donn.ingle at gmail.com
Sun Jan 13 07:27:54 EST 2008


> So on *your* system, today: what encoding are the filenames encoded in?
> We are not talking about arbitrary files, right, but about font files?
> What *actual* file names do these font files have?
>
> On my system, all font files have ASCII-only file names, even if they
> are for non-ASCII characters.
I guess I'm confused by that. I can ls them, so they appear and thus have 
characters displayed. I can open and cat them and thus the O/S can access 
them, but I don't know whether their characters are strictly in ascii-limits 
or drawn from a larger set like unicode. I mean, I have seen Japanese 
characters in filenames on my system, and that can't be ascii.

You see, I have a large collection of fonts going back over 10 years and they 
came from usenet years ago and so have filenames mangled all to hell.

I can't always *type* some of their names and have to use copy/paste to, for 
example, ls one of them.

Again, it's working from ignorance (my own) : I assume filenames in different 
countries will be in character sets that I have never (nor will I ever) see. 
But I have to cover them somehow.

> >  Or is that a waste of time because os.listdir() has already tried
> > something similar (and prob. better)?
> "better" is a difficult notion here. Is it better to produce some
> result, possibly incorrect, or is it better to give up?
I think I see, combined with your previous advice - I will keep byte strings 
alongside unicode and where I can't get to the unicode for that string, I 
will keep an 'ignore' or 'replace' unicode, but I will still have the byte 
string and will access the file with that anyway.

> If the user has set up his machine correctly: yes.
Meaning, I am led to assume, the LANG variable primarily?

\d



More information about the Python-list mailing list