Encoding of file names

"Martin v. Löwis" martin at v.loewis.de
Thu Dec 8 17:45:06 EST 2005


utabintarbo wrote:
> Fredrik, you are a God! Thank You^3. I am unworthy </ass-kiss-mode>
> 
> I believe that may do the trick. Here is the results of running your
> code:

For all those who followed this thread, here is some more explanation:

Apparently, utabintarbo managed to get U+2592 (MEDIUM SHADE, a filled
50% grayish square) and U+2524 (BOX DRAWINGS LIGHT VERTICAL AND LEFT,
a vertical line in the middle, plus a line from that going left) into
a file name. How he managed to do that, I can only guess: most likely,
the Samba installation assumes that the file system encoding on
the Solaris box is some IBM code page (say, CP 437 or CP 850). If so,
the byte on disk would be \xb4. Where this came from, I have to guess
further: perhaps it is ACUTE ACCENT from ISO-8859-*.

Anyway, when he used listdir() to get the contents of the directory,
Windows applies the CP_ACP encoding (known as "mbcs" in Python).
For reasons unknown to me, the US and several European versions
of XP map this to \xa6, VERTICAL BAR (I can somewhat see that
as meaningful for U+2524, but not for U+2592).

So when he then applies isfile to that file name, \xa6 is mapped
to U+00A6, which then isn't found on the Samba side.

So while Unicode here is the solution, the problem is elsewhere;
most likely in a misconfiguration of the Samba server (which assumes
some encoding for the files on disk, yet the AIX application
uses a different encoding).

Regards,
Martin



More information about the Python-list mailing list