[Python-Dev] PEP 383 (again)

"Martin v. Löwis" martin at v.loewis.de
Tue Apr 28 22:04:12 CEST 2009


> Your proposal says that utf-8b would be used for file systems, but then
> you also say that it might be used for command line arguments and
> environment variables.  So, which specific APIs will it be used with on
> Windows and on POSIX systems?

On Windows, the Wide APIs are already used throughout the code base,
e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the
specific API for a specific functionality, please read the source code.

> Or will utf-8b simply not be available
> on Windows at all?

It will be available, but it won't be used automatically for
anything.

> What happens if I create a Python version of tar,
> utf-8b strings slip in there, and I try to use them on Windows?

No need to create it - the tarfile module is already there. By
"in there", do you mean on the file system, or in the tarfile?

> You also assume that all Windows file system functions strictly conform
> to UTF-16 in practice (not just on paper).  Have you verified that?

No, I don't assume that. I assume that all functions are strictly
available in a Wide character version, and have verified that they are.

> What's the situation on Windows CE?

I can't see how this question is relevant to the PEP. The PEP says this:

# On Windows, Python uses the wide character APIs to access
# character-oriented APIs, allowing direct conversion of the
# environmental data to Python str objects.

This is what it already does, and this is what it will continue to do.

> Another question on Linux: what happens when I decode a file system path
> with utf-8b and then pass the resulting unicode string to Gnome?  To
> Qt?

You probably get moji-bake, or an error, I didn't try.

> To windows.forms?  To Java?

How do you do that, on Linux?

> To a unicode regular expression library?

You mean, SRE? SRE will match the code points as individual characters,
class Cs. You should have been able to find out that for yourself.

> To wprintf?

Depends on the wprintf implementation.

> AFAIK, the behavior of most libraries is
> undefined for the kinds of unicode strings you construct, and it may be
> undefined in a bad way (crash, buffer overflow, whatever).

Indeed so. This is intentional. If you can crash Python that way,
nothing gets worse by this PEP - you can then *already* crash Python
in that way.

Regards,
Martin


More information about the Python-Dev mailing list