[Python-Dev] PEP 383 (again)

"Martin v. Löwis" martin at v.loewis.de
Wed Apr 29 07:45:08 CEST 2009


> The wide APIs use UTF-16.  UTF-16 suffers from the same problem as
> UTF-8: not all sequences of words are valid UTF-16 sequences.  In
> particular, sequences containing isolated surrogate pairs are not
> well-formed according to the Unicode standard.  Therefore, the existence
> of a wide character API function does not guarantee that the wide
> character strings it returns can be converted into valid unicode
> strings.  And, in fact, Windows Vista happily creates files with
> malformed UTF-16 encodings, and os.listdir() happily returns them.

Whatever. What does that have to do with PEP 383? Your claim was
that PEP 383 may have unfortunate effects on Windows, and I'm telling
you that it won't, because the behavior of Python on Windows won't
change at all. So whatever the problem - it's there already, and the
PEP is not going to change it.

I personally don't see a problem here - *of course* os.listdir will
report invalid utf-16 encodings, if that's what is stored on disk.
It doesn't matter whether the file names are valid wrt. some
specification. What matters is that you can access all the files.

Regards,
Martin


More information about the Python-Dev mailing list