[Python-Dev] PEP 277 (unicode filenames): please review

Jack Jansen Jack.Jansen@cwi.nl
Tue, 13 Aug 2002 13:32:33 +0200


I was going to suggest that if we return mixed sets of unicode/string=20
values from listdir() we could also do the same thing for platforms=20
where FileSystemDefaultEncoding is utf-8, such as MacOSX.

But as usual with unicode, when I actually try this it doesn't work, and=20=

I don't understand why not. Why is unicode always something that seems=20=

so simple and logical until you actually try it??!?!?

Here's a transcript of my Python session. The terminal has been set to=20=

render in latin-1. The directory contains one file, "fr=F6r"=20
(fr-o-umlaut-r).
sap!jack- python
Python 2.3a0 (#32, Aug 12 2002, 15:31:25)
[GCC 2.95.2 19991024 (release)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> import os
 >>> os.listdir('.')
['fro\xcc\x88r']
 >>> utf8name =3D os.listdir('.')[0]
 >>> unicodename =3D utf8name.decode('utf-8')
 >>> unicodename
u'fro\u0308r'
 >>> print unicodename.encode('latin-1')
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
UnicodeError: Latin-1 encoding error: ordinal not in range(256)
 >>>

Sigh. \u0308 is not in the range(256), but the whole point of=20
encode('latin-1') is to make it so, isn't it? And o-umlaut definitely=20
has a latin-1 encoding. I tried the same with macroman in stead of=20
latin-1 (just to make sure this wasn't a bug in the latin-1 encoder),=20
but still no go.

What am I doing wrong?
--
- Jack Jansen        <Jack.Jansen@oratrix.com>       =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma=20
Goldman -