[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

Jim Jewett jimjjewett at gmail.com
Fri Oct 3 19:35:31 CEST 2008


On Wed, Oct 1, 2008 at 10:36 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> The average user does not even /know/ what a charset is.

Because for the average user, there is no need.

Part of the HTML5 standard is how to guess at charsets, and when to
automatically use a superset instead of the declared encoding.  For
most of the US and Europe, the guesses are good enough.

For the languages and countries where multiple charsets are in common
use, and the guesses are often wrong, browser vendors say that the
change charset commands are well-known and frequently used.

> If a filename can't be exactly
> represented with a valid Unicode sequence, all
> applications wanting to access
> that file are impacted in the same way,

Not really.

Some utilities never really need to display the filename; they just
need to be able to manage the file.

Many applications need to display a file chooser, but may never need
to actually open problematic files, and may not need an accurate or
complete representation.  (Consider "Progra~1" on windows.)


> This sounds very much like a
> Python-level (or at least stdlib-level) problem to me.

The stdlib should provide a way of dealing with raw bytes.  Beyond
that, the needs get too specialized.  (And that way of dealing with
raw bytes *might* just be documenting the Latin-1 hack.)

> Are you suggesting that the solution to the filename
> problem is to prompt the
> user and ask them for a different encoding?

For some applications, yes.

-jJ


More information about the Python-3000 mailing list