[Python-Dev] Python-3.0, unicode, and os.environ

Isaac Morland ijmorlan at uwaterloo.ca
Thu Dec 11 14:58:51 CET 2008


On Thu, 11 Dec 2008, Ulrich Eckhardt wrote:

> On Thursday 11 December 2008, Steve Holden wrote:
>> Ulrich Eckhardt wrote:
>> Seems to me this just threatens to add to the confusion.
>>
>> If you know what your filesystem produces, you can take the appropriate
>> action to convert it into a type that makes sense to the user. If you
>> don't, then at least if you have the string in its bytes form you can
>                                       ^^^^^^^^^^^^^^^^^^^
>
> There are operating systems that don't use bytes to represent a file path,
> namely all the MS Windows variants. Even worse, when you use a byte string
> there, it typically means that you want to use the obsolete encoding that is
> based on codepages.
>
> Why can we not preserve the representation of a path as it is? Why do we
> _have_ to convert it to anything at all, without even knowing if this
> conversion is needed? I just want to do something to a file's content, why
> does its path have to be converted to something and then be converted back in
> order for the system to digest it?
>
>> re-present it to the filesystem to manipulate the file. What are we
>> supposed to do with the "special type"?
>
> You receive from readdir() and pass it to stat(), simple as that. No
> conversions from the native representation needed. If you need a textual
> representation, then you have to convert it and you have to do so explicitly
> according to whatever logic your application requires.

Not only would this address the issue with the local filesystem, it would 
also provide a principled way to deal with remote filesystems.  For 
example, an FTP interface library for Python could use this type to 
returns paths of the sort actually supported by the raw FTP protocol.

Thinking of "the" filesystem is actually a misconception - always 
referring to "a" filesystem opens up all sorts of possibilities.  There is 
a lot of coding to do to allow this, but allowing programs to work with 
paths and files in the local filesystem, remote filesystems, and 
filesystems constructed from others (e.g., by expanding symlinks, changing 
the root similar to chroot, or encoding/unencoding pathnames) would open 
up lots of possibilities, including better test environments.

This is an interesting case of separating byte strings from character 
strings.  As long as the two are conflated, everything appears simple. 
But when they are separated, not only are there two types where before 
there was only one, it turns out that which type is correct in some 
circumstances depends on the platform.  Also, many objects which are byte 
strings at the protocol level are usually or always meant to be character 
strings of some sort, but how to translate them simply cannot be nailed 
down once and for all.

Isaac Morland			CSCF Web Guru
DC 2554C, x36650		WWW Software Specialist


More information about the Python-Dev mailing list