[Python-Dev] a suggestion ... Re: PEP 383 (again)

Thomas Breuel tmbdev at gmail.com
Thu Apr 30 16:42:45 CEST 2009


>
> What's an analogous failure? Or, rather, why would a failure analogous
> to the one I got when using System.IO.DirectoryInfo ever exist in
> Python?


Mono.Unix uses an encoder and a decoder that knows about special quoting
rules.  System.IO uses a different encoder and decoder because it's a
reimplementation of a Microsoft library and the Mono developers chose not to
implement Mono.Unix quoting rules in it.  There is nothing technical
preventing System.IO from using the Mono.Unix codec, it's just that the
developers didn't want to change the behavior of an ECMA and Microsoft
library.

The analogous phenomenon will exist in Python with PEP 383.  Let's say I
have a C library with wide character interfaces and I pass it a unicode
string from Python.(*)  That C library now turns that unicode string into
UTF-8 for writing to disk using its internal UTF-8 converter.   The result
is that the file can be opened using Python's "open", but it can't be opened
using the other library.  There simply is no way you can guarantee that all
libraries turn unicode strings into pathnames using utf-8b.   I'm not
arguing about whether that's good or bad anymore, since it's obvious that
the only proposal acceptable to Guido uses some form of non-standard
encoding / quoting.

I'm simply pointing out that the failure you observed with System.IO has
nothing to do with which quoting convention you choose, but results from the
fact that the developers of System.IO are not using the same encoder/decoder
as Mono.Unix (in that case, by choice).

So, I don't see any reason to prefer your half surrogate quoting to the Mono
U+0000-based quoting.  Both seem to achieve the same goal with respect to
round tripping file names, displaying them, etc., but Mono quoting actually
results in valid unicode strings.  It works because null is the one
character that's not legal in a UNIX path name.

So, why do you prefer half surrogate coding to U+0000 quoting?

Tom

(*) There's actually a second, sutble issue.  PEP 383 intends utf-8b only to
be used for file names.  But that means that I might have to bind the first
argument to TIFFOpen with utf-8b conversion, while I might have to bind
other arguments with utf-8 conversion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/57f4ecdb/attachment-0001.htm>


More information about the Python-Dev mailing list