[Python-Dev] My work on Python3 and non-ascii paths is done
Victor Stinner
victor.stinner at haypocalc.com
Fri Oct 22 14:01:44 CEST 2010
Le jeudi 21 octobre 2010 21:14:55, Toshio Kuratomi a écrit :
> > That's exactly what I was looking for! Thanks. I think you've learned a
> > huge amount of good information that's difficult to find, so writing it
> > up in a more permanent and easy to find location will really help future
> > Python developers!
>
> One further thing I'd be interested in is if you could document any best
> practices from this experience. Things like, "surrogateescape is a
> good/bad default in these cases",
I advice to use the PEP 383 (surrogateescape) when the *native* data type is
bytes. Some examples:
- filenames on UNIX/BSD
- environment variables on UNIX/BSD
- well, most data send/received from the system on UNIX/BSD :-)
For network protocols, I don't know. It looks like the new email modules will
offer two API levels: low level (native type) using bytes, high level using
str (unicode). I don't know if the high level API uses the PEP 383 or not.
PEP 383 can be used to avoid UnicodeDecodeError. But sometimes it's better to
raise an error to warn the user that the encoding is incorrect or the input
data is invalid (well, at least not correctly according to the encoding).
I don't use strict rules. Each problem is different. Eg. it looks like not
everybody agrees to use the PEP 383 for the host/domain name (issue #9377, I
didn't read the whole issue, just few lines).
> When is parallel functions for bytes and str better than a single
> polymorphic function?
If you cannot decide the output type depending on the inputs, it's better to
have two functions.
Examples:
- 2 functions; os.getcwd() / os.getcwdb().
- polymorphic: os.path.*()
But you should never accept mixed types, eg. os.path.join(b'bytes', 'unicode)
have to raise a TypeError.
--
Victor Stinner
http://www.haypocalc.com/
More information about the Python-Dev
mailing list