[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

"Martin v. Löwis" martin at v.loewis.de
Sat Apr 25 18:33:17 CEST 2009


> I see two main user-oriented use cases for the resulting Unicode
> strings this PEP will produce on all systems: displaying a list of
> filenames for the user to select from (an open file dialog), and
> allowing a user to edit or supply a filename (a save dialog or a
> rename control).

There are more, in particular the case "user passes a file name
on the command line", and "web server passes URL in environment
variable".

> It's clear what this PEP provides for the former. On well-behaved
> systems where a simpler filesystemencoding approach would work, the
> results are identical; the user can select filenames that are what he
> expects to see on both Unix and Windows. On less well-behaved systems,
> some characters may appear as junk in the middle of the name (or would
> they be invisible?)

Depends on the rendering. Try "print u'\udc00'" in your terminal to see
what happens; for me, it renders the glyph for "replacement character".
In GUI applications, you often see white boxes (rectangles).

> What I don't find clear is what the risks are for the latter. On the
> less well behaved system, a user may well attempt to use this python
> application to fix filenames. Can we estimate a likelihood that edits
> to the names would result in a Unicode string that can no longer be
> encoded with the python-escape? Will a new name fully provided by a
> user on his keyboard (ignoring copy and paste) almost always safely
> encode?

That very much depends on the system setup, and your impression is
right that the PEP doesn't address it - it only deals with cases
where you get random unsupported bytes; getting random unsupported
characters from the user is not considered.

If the user has the locale setup in way that matches his keyboard,
it should work all fine - and will already, even without the PEP.
If the user enters a character that doesn't directly map to a
good file name, you get an exception, and have to tell the user
to pick a different filename.

Notice that it may fail at several layers:
- it may be that characters entered are not supported in what
  Python choses as the file system encoding.
- it may be that the characters are not supported by the file
  system, e.g. leading spaces in Win32.
- it may be that the file cannot be renamed because the target
  name already exists.
In all these cases, the application has to ask the user to
reconsider; for at least the last case, it should be prepared
to do that, anyway (there is also the case where renaming fails
because of lack of permissions; in that case, picking a different
file name won't help).

Regards,
Martin



More information about the Python-Dev mailing list