[Python-Dev] unicode_string future, str -> basestring, fix or feature

Stephen J. Turnbull stephen at xemacs.org
Tue Mar 4 14:23:52 CET 2014


>>>>> Guido van Rossum writes:

 > Given that the claim "Python 2 doesn't support Unicode filenames"
 > is factually incorrect (in Python 2.7, most filesystem calls in
 > fact do support Unicode, at least on some platforms),

I don't understand what "support Unicode" means.  Just that

    with open(u"\u4e00", "w") as f: f.write("works!\n")

does what is expected[1] if the user knows what he is doing (ie, has
set PYTHONIOENCODING to a Unicode UTF or one of the Asian encodings)?

 > I think individual functions in the os module that are found
 > lacking should be considered bugs, and if someone goes through 
 > the effort to supply an otherwise acceptable fix, we shouldn't
 > reject it on the basis that we don't want to consider supporting
 > Unicode filenames.

As above, "acceptable fix" means take whatever the current value is
for file system name encoding, and use that to encode and decode
unicode objects to/from str, or raise a UnicodeError if it doesn't
work?

I think it's important to define this somewhat carefully, because this
is an area that has a strong tendency to "mission creep".  Given that
builtin open "works" by the above definition, I guess it's reasonable
to accept such patches.

Footnotes: 
[1] It writes the line "works!\n" to a file whose name consists of the
single Chinese character for "one".




More information about the Python-Dev mailing list