[Python-Dev] Bytes path support

Isaac Morland ijmorlan at uwaterloo.ca
Fri Aug 22 01:06:55 CEST 2014


On Thu, 21 Aug 2014, Chris Barker wrote:

> so they are "just byte strings", oh, except that you can't have a  null, and
> the "slash" had better be code 47 (and vice versa). How is that different
> than "bytes-in-some-arbitrary-encoding-where-at-least
> the-slash-character-is-ascii-compatible"?

Actually, slash doesn't need to be code 47.  But no matter what code 47 
means outside of the context of a filename, it is the path arc separator 
byte (not character).

In fact, this isn't even entirely academic.  On a Mac OS X machine, go 
into Finder and try to create a directory called ":".  You'll get an error 
saying 'The name “:” can’t be used.'.  Now create a directory called "/". 
No problem, raising the question of what is going on at the filesystem 
level?

Answer:

$ ls -al
total 0
drwxr-xr-x   3 ijmorlan  staff   102 21 Aug 18:57 ./
drwxr-xr-x+ 80 ijmorlan  staff  2720 21 Aug 18:57 ../
drwxr-xr-x   2 ijmorlan  staff    68 21 Aug 18:57 :/

And of course in shell one would remove the directory with this:

rm -rf :

not:

rm -rf /

So in effect the file system path arc encoding on Mac OS X is UTF-8 
*except* that : is outlawed and / is encoded as \x3A rather than the usual 
\x2F.  Of course, the path arc separator byte (not character) remains \x2F 
as always.

Just for fun, there are contexts in which one can give a full path at the 
GUI level, where : is used as the path separator.  This is for historical 
reasons and presumably is the reason for the above-noted behaviour.

I think the real tension here is between the POSIX level where filenames 
are byte strings (except for \x00, which is reserved for string 
termination) where \x2F has special interpretation, and absolutely every 
application ever written, in every language, which wants filenames to be 
character strings.

Isaac Morland			CSCF Web Guru
DC 2554C, x36650		WWW Software Specialist


More information about the Python-Dev mailing list