[Python-Dev] fun with unicode, part 1

Guido van Rossum guido@python.org
Thu, 27 Apr 2000 11:23:50 -0400


> >>> filename = u"gröt"
> 
> >>> file = open(filename, "w")
> >>> file.close()
> 
> >>> import glob
> >>> print glob.glob("gr*")
> ['gr\303\266t']
> 
> >>> print glob.glob(u"gr*")
> [u'gr\366t']
> 
> >>> import os
> >>> os.system("dir gr*")
> ...
> GRÇôT                    0  01-02-03  12.34 grÇôt
>          1 fil(es)              0 byte
>          0 dir         12 345 678 byte free
> 
> hmm.

I presume that Fredrik's gripe is that the filename has been converted
to UTF-8, while the encoding used by Windows to display his directory
listing is Latin-1.  (Not Microsoft's own 8-bit character set???)

I'd like to solve this problem, but I have some questions: what *IS*
the encoding used for filenames on Windows?  This may differ per
Windows version; perhaps it can differ drive letter?  Or per
application or per thread?  On Windows NT, filenames are supposed to
be Unicode.  (I suppose also on Windowns 2000?)  How do I open a file
with a given Unicode string for its name, in a C program?  I suppose
there's a Win32 API call for that which has a Unicode variant.

On Windows 95/98, the Unicode variants of the Win32 API calls don't
exist.  So what is the poor Python runtime to do there?

Can Japanese people use Japanese characters in filenames on Windows
95/98?  Let's assume they can.  Since the filesystem isn't Unicode
aware, the filenames must be encoded.  Which encoding is used?  Let's
assume they use Microsoft's multibyte encoding.  If they put such a
file on a floppy and ship it to Linköping, what will Fredrik see as
the filename?  (I.e., is the encoding fixed by the disk volume, or by
the operating system?)

Once we have a few answers here, we can solve the problem.  Note that
sometimes we'll have to refuse a Unicode filename because there's no
mapping for some of the characters it contains in the filename
encoding used.  Question: how does Fredrik create a file with a Euro
character (u'\u20ac') in its name?

--Guido van Rossum (home page: http://www.python.org/~guido/)