[Python-Dev] Python-3.0, unicode, and os.environ

Sat Dec 6 18:00:58 CET 2008

On Fri, Dec 5, 2008 at 10:18 PM, Bugbee, Larry <larry.bugbee at boeing.com> wrote:
> There has been some discussion here that users should use the str or
> byte function variant based on what is relevant to their system, for
> example when getting a list of file names or opening a file.  That
> thought process really doesn't do much for those of us that write code
> that needs to run on any platform type, without alteration or the
> addition of complex if-statements and/or exceptions.
>
> Whatever the resolution here, and those of you addressing this thorny
> issue have my admiration, the solution should be such that it gives
> consistent behavior regardless of platform type and doesn't require the
> programmer to know of all the minute details of each possible target
> platform.

My prediction is that it won't ever be possible to completely hide
this difference between platforms. The platforms differ fundamentally
in how they see filenames. An elaborate abstraction can certainly be
created that smooths out most of the differences, but at some point
useful functionality will have to be lost in order to maintain strict
platform independence. This is the fate of most platform-independence
abstractions by the way. For example, there are many elaborate
packages for platform-independent I/O, but they generally don't
provide access to all functionality that is available on a platform.
Where they do, the application is once again placed in the position of
having to use complex if-statements and/or exceptions.

Consider just this example. Many programs have a need to ask their
user for a filename to be created by the program. On systems where
filenames are raw byte strings, do you want to provide the user with a
way to specify an arbitrary byte string? (That is, in addition to the
normal case of entering a text string that will be transformed into a
filename using some encoding.) Your choices are either not to support
the case of bytes that aren't a valid encoding in the current
encoding, or add a UI element to select an encoding, or add a UI
element to enter raw bytes. An abstraction package is likely to only
support the first option (this is what Java does BTW), but this is not
acceptable to all applications.

> That may not be possible for a while, so interim solutions should be
> such that it minimizes later pain.  If that means hiding "implementation
> details" behind a new function, so be it.  Then, at least, the body of
> one's app is not burdened with this problem later when conditions
> change.

I believe the problem's severity is actually overstated. The interim
solution with the least amount of pain that will work for almost all
apps is to treat filenames as text strings encoded in some default
encoding, and ignore filenames that aren't valid encodings of any text
string. Yes, it is possible that you'll find that you can't completely
remove or traverse certain directory trees. But that's a fact of life
anyway (filesystems have many hidden failure modes), so you're better
off dealing with *that* possibility than worrying over the issue of
undecodable filenames.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)