LANG, locale, unicode, setup.py and Debian packaging

"Martin v. Löwis" martin at v.loewis.de
Sun Jan 13 15:43:28 EST 2008


> Now, I want to open that file from Python, and I create a path with 
> os.path.join() and an os.listdir() which results in this byte string:
> paf = ['/home/donn/.fontypython/M\xc3\x96gul.pog']
> 
> I *think* that the situation is impossible because the system cannot resolve 
> the correct filename (due the locale being ANSI and the filename being other) 
> but I am not 100% sure.

Not at all. The string you pass is a *byte* string, not a character
string. You may think that the first letter of it is an aitch,
but that's just your interpretation - it really is the byte 104.

The operating system does not interpret the file names as characters
at all, with the exception of treating byte 47 as the path separator
(typically interpreted by people as "slash").

Your locale becomes only relevant when displaying file names, and
having to chose what glyphs to use.

> So, I have been trying combinations of open:
> 1. f = codecs.open( paf, "r", "utf8" )
> I had hopes for this one.
> 2. f = codecs.open( paf, "r", locale.getpreferredencoding())
> 3. f = open( paf, "r")

Now you are mixing two important concepts - the *contents*
of the file with the *name* of the file. These are entirely
independent, and the file name may be in one encoding and
the file contents in another, or the file contents may not
represent character data at all.

All these three APIs try to get to the *contents* of the
file, by opening it.

The name is already a byte string (as a character string,
it would have started with u'...'), so there is no need
to encode it. What the content of a .pog file is, I don't
know, so I can't tell you what encoding it is encoded it.

> But none will open it - all get a UnicodeDecodeError. This aligns with my 
> suspicions, but I wanted to bounce it off you to be sure.

Option three should have worked if paf was a string, but
above, I see it as a *list* of strings. So try

  f = open(paf[0], "r")#

where paf[0] should be '/home/donn/.fontypython/M\xc3\x96gul.pog',
as paf is ['/home/donn/.fontypython/M\xc3\x96gul.pog']

Still, I question that you *really* got a UnicodeDecodeError
for three: I get

  TypeError: coercing to Unicode: need string or buffer, list found

Can you please type

paf = ['/home/donn/.fontypython/M\xc3\x96gul.pog']
f = open(paf, "r")

at the interactive prompt, and report the *complete* shell output?

> Also, this codecs.open(filename, "r", <encoding>) function:
> 1. Does it imply that the filename will be opened (with the name as it's 
> type : i.e. bytestring or unicode ) and written *into* as <encoding> 
> 2. Imply that filename will be encoded via <encoding> and written into as 
> <encoding>
> It's fuzzy, how is the filename handled?

See above. The encoding in codecs.open has no effect at all on
the file name; it only talks about the file content.

Regards,
Martin




More information about the Python-list mailing list