[Python-ideas] Fix default encodings on Windows
Stephen J. Turnbull
turnbull.stephen.fw at u.tsukuba.ac.jp
Wed Aug 17 22:32:30 EDT 2016
eryk sun writes:
> On Wed, Aug 17, 2016 at 9:35 AM, Stephen J. Turnbull
> <turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> > BTW, why "surrogate pairs"? Does Windows validate surrogates to
> > ensure they come in pairs, but not necessarily in the right order (or
> > perhaps sometimes they resolve to non-characters such as U+1FFFF)?
>
> Microsoft's filesystems remain compatible with UCS2
So it's not just invalid surrogate *pairs*, it's invalid surrogates of
all kinds. This means that it's theoretically possible (though I
gather that it's unlikely in the extreme) for a real Windows filename
to indistinguishable from one generated by Python's surrogateescape
handler.
What happens when Python's directory manipulation functions on Windows
encounter such a filename? Do they try to write it to the disk
directory? Do they succeed? Does that depend on surrogateescape?
Is there a reason in practice to allow surrogateescape at all on names
in Windows filesystems, at least when using the *W API? You mention
non-Microsoft filesystems; are they common enough to matter?
I admit that as we converge on sanity (UTF-8 for text/* content, some
kind of Unicode for filesystem names) none of this is very likely to
matter, but I'm a worrywart....
Steve
More information about the Python-ideas
mailing list