[Python-Dev] Python-3.0, unicode, and os.environ

Steven D'Aprano steve at pearwood.info
Sat Dec 6 03:06:40 CET 2008


On Sat, 6 Dec 2008 11:48:27 am Nick Coghlan wrote:
> Toshio Kuratomi wrote:
> > Nick Coghlan wrote:
...
> >> Why? Most programs won't be able to do anything with it. And if
> >> the program *can* do something with it... that's what the bytes
> >> version of the APIs are for.
> >
> > Nonsense.  A program can do tons of things with a non-decodable
> > filename.  Where it's limited is non-decodable filedata.
>
> You can't display a non-decodable filename to the user, hence the
> user will have no idea what they're working on. Non-filesystem
> related apps have no business trying to deal with insane filenames.

I don't agree. Putting my user's hat on, I know what I would expect: the 
app should display *some* name, it doesn't matter exactly what, so long 
as:

* it's as close as possible to the "real" name; 

* it is unique in that directory (doesn't shadow another file); and

* it's enough to identify the file so I can read/save/delete/rename the 
file.

I think there are analogous situations: long-time Windows users will be 
used to seeing files listed as "longfilename.txt" in some applications 
and "longfi~1.txt" in another. Under POSIX, file names can contain 
unprintable ctrl characters, and the shell will print them at least 
three ways, depending on context. E.g. for a file containing a 
formfeed, I get one of ? \f or ^L in bash.

Applications can deal with such weird file names. KDE's file manager 
(konqueror) and file selection dialog both show the character as a 
small square, presumably the font's missing character glyph, and KDE 
apps can open and save the file. Still speaking as a user, I think it 
is quite reasonable to expect applications to deal with undisplayable 
filenames: displaying the name and opening the file are orthogonal 
concepts, although I accept that command-line interfaces will have 
difficulty with file names that can't be typed by the user!

I appreciate that broken unicode is more difficult to deal with than 
unprintable control characters, but the basic principle is the same.


-- 
Steven


More information about the Python-Dev mailing list