[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Tue Sep 30 23:34:37 CEST 2008


2008/9/30 Glenn Linderman <v+python at g.nevcal.com>:

> So the problem is that a Unicode file system interface can't deal with
> non-UTF-8 byte streams as file names.
>
> So it seems there are four suggested approaches, all of which have aspects
> that are inconvenient.

Let's not forget what happens when a non-UTF-8 file name is read from
a file or written to a file, under the assumption that the filename is
written to the file directly (which probably breaks for filenames
containing newlines or such).

> 4) Use of bytes APIs on FS interfaces.  This seems to be the "solution"
> adopted by Posix that creates the "problem" encountered by Unicode-native
> applications.  It is cumbersome to deal with within applications that
> attempt to display the names.  What do Posix-style "open file" dialog boxes
> do in this case?

http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html#g-filename-display-name

I used to observe three different ways to display such filenames
within gedit (including %xx and \xx escapes), but now it is
consistent, probably because it switched to using the above function
everywhere:
$ touch $'abc\xffz'
$ gedit
The Open dialog shows:
   abc�z (invalid encoding)
When the file is open, the window title and the tab title show:
   abc�z
and the same is in recent file list.

It has a bug: it appends " (invalid encoding)" even if the filename
contains a correctly encoded U+FFFD character. Nautilus has the same
behavior and the same bug because this is a design bug of that
function which does not allow to tell whether the conversion was
successful.

A filename containing a newline is sometimes displayed in two lines,
and sometimes with a U+000A character from a fallback font (hex
character number in a box).

-- 
Marcin Kowalczyk
qrczak at knm.org.pl
http://qrnik.knm.org.pl/~qrczak/


More information about the Python-Dev mailing list