[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Nick Coghlan ncoghlan at gmail.com
Wed Apr 13 09:51:02 EDT 2016


On 13 April 2016 at 02:15, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/11/2016 04:43 PM, Victor Stinner wrote:
>>
>> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit :
>
>
>>> So my concern in such a case is what happens if we pass this SE
>>> string somewhere else: a UTF-8 file, or over a socket, or into a
>>> database? Does this have issues that we wouldn't face if we just used
>>> bytes?
>>
>>
>> "SE string" are returned by os.listdir(str), os.walk(str),
>> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
>> the sun.
>
>
> So when we pass a bytes object in, Python (on posix) converts that to a
> string using surrogateescape, gets back strings from the os, and encodes
> them back to bytes, again using surrogateescape?

On POSIX, if you pass bytes to the os module, it will pass bytes to
the underlying system API, and then pass bytes back to your
application.

The potentially SE-strings only come back when you pass str, and the
operating system data isn't properly encoded according to the nominal
filesystem encoding. They round trip nicely to other operating system
APIs, but can indeed be a problem if they escape to other parts of
your program (hence ideas like
http://bugs.python.org/issue18814#msg251694 and the preceding
discussion in that issue)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list