Files with japanese filenames under Windows
Guenter Radestock
guenter.radestock at sapportals.com
Mon Jan 7 09:37:09 EST 2002
Hello,
we found a number of problems concerning the handling of japanese filenames
on the Windows platform. Can you please tell me where to find more information
on these issues or what I can do my self to fix some of this?
1. In Python 2.2, you can open() or os.listdir() files or directories
and specify their name as unicode objects instead of strings.
Unfortunately, this works only if the unicode object you specify is
representable in the codepage determined by the systems "regional
options". Once you change the "regional options" from japanese to
english, unicode filenames that were previously accessible become
inaccessible.
As far as I understand, this is because Python internally converts
the unicode filename to the system codepage (determined by the
"regional options".) On western (US, european) systems, the system
codepage is some 8 bit codepage that cannot represent the japanese
name, so the name gets corrupted and may not be accessed.
The problem is really, that files (that may reside on some remote
server) with japanese filenames may not be accessed from Python on
a Windows computer with a non japanese "regional option" setting.
As far as I know, to solve this either the corresponding locale
setting has to be changed (which, to my knowledge, the Python API's
to locale do not allow), or special
Windoze-Wide-String-Unicode-What-So-Ever API's have to be used to
access (open or listdir) the Unicode filename.
Because I don't know how to use locale and I can't find out because
I don't find any documentation that explains locale in a way I can
understand (any reccomendation?), I am thinking of implementing the
second either as an extension for my purpose or maybe as a patch to
Python it self. Maybe somebody else has solved this allready?
2. When trying to access a file with a unicode name that gets corrupt
during internal conversion, a misleading and incorrect error
message is produced in Python 2.2 (I tried to report this for the
beta, but some way my report got lost):
>>> open(u'\u30c6\u30b9\u30c8\u7528\u30d5\u30a9\u30eb\u30c0\\test1en.doc')
Traceback (most recent call last):
File "<pyshell#0>", line 1, in ?
open(u'\u30c6\u30b9\u30c8\u7528\u30d5\u30a9\u30eb\u30c0\\test1en.doc')
IOError: invalid argument: r
>>> open(u'\u30c6\u30b9\u30c8\u7528\u30d5\u30a9\u30eb\u30c0\\test1en.doc', 'rb')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in ?
open(u'\u30c6\u30b9\u30c8\u7528\u30d5\u30a9\u30eb\u30c0\\test1en.doc', 'rb')
IOError: invalid argument: rb
(the problem is with the filename and has nothing to do with the
optional second argument to open(), what the message reports)
3. If I os.listdir() a directory, I get a list of plain strings, on a
japanese system encoded with the (there standard) sjis encoding.
If I want to use these names for something else than immediately
opening the files on the same system, a unicode representation of
the name would be better. To get this, I have to query for the
filename encoding, but I don't know how to do this in a portable
(also works under Unix) way.
4. os.path.split() does not (allways) give the correct result on a
japanese system, because in the there standard sjis encoding the
"/" may appear as part of a multibyte character. First converting
to unicode, then using os.path.split() gives the right result, but
again I have to get the system codepage for filenames to do this.
Arithmetic that involves os.listdir, os.path.split and os.path.join
may fail miserably on a japanese system, if some troublesome
filenames are used (and, of course, in reality these names are used
- I will be glad to provide examples, if anyone needs them...)
Thanks in advance for any help.
- Guenter
More information about the Python-list
mailing list