[Python-ideas] Fix default encodings on Windows

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Wed Aug 17 22:32:30 EDT 2016


eryk sun writes:
 > On Wed, Aug 17, 2016 at 9:35 AM, Stephen J. Turnbull
 > <turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
 > > BTW, why "surrogate pairs"?  Does Windows validate surrogates to
 > > ensure they come in pairs, but not necessarily in the right order (or
 > > perhaps sometimes they resolve to non-characters such as U+1FFFF)?
 > 
 > Microsoft's filesystems remain compatible with UCS2

So it's not just invalid surrogate *pairs*, it's invalid surrogates of
all kinds.  This means that it's theoretically possible (though I
gather that it's unlikely in the extreme) for a real Windows filename
to indistinguishable from one generated by Python's surrogateescape
handler.

What happens when Python's directory manipulation functions on Windows
encounter such a filename?  Do they try to write it to the disk
directory?  Do they succeed?  Does that depend on surrogateescape?

Is there a reason in practice to allow surrogateescape at all on names
in Windows filesystems, at least when using the *W API?  You mention
non-Microsoft filesystems; are they common enough to matter?

I admit that as we converge on sanity (UTF-8 for text/* content, some
kind of Unicode for filesystem names) none of this is very likely to
matter, but I'm a worrywart....

Steve


More information about the Python-ideas mailing list