[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Brett Cannon brett at python.org
Mon Apr 11 17:43:01 EDT 2016


On Mon, 11 Apr 2016 at 14:11 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/11/2016 01:42 PM, Victor Stinner wrote:
> > 2016-04-11 21:00 GMT+02:00 Brett Cannon:
>
> >> I'm -0 on allowing __fspath__ to return bytes, but we can see what
> others
> >> think.
> >
> > With the PEP 383, a bytes filename can be stored as str using the
> > surrogateescape error handler. So DirEntry can convert a bytes path to
> > str using os.fsdecode().
>
> I am far from a unicode expert, but if I understand this correctly you
> are proposing that DirEntry.__whatever__ can always return a str using
> the surogateescape (SE) method.
>
> However, before this SE string can be used, it would need to be
> converted back to bytes, and with the same SE method, yes?  And this has
> already been implemented in the stdlib?
>
> So my concern in such a case is what happens if we pass this SE string
> somewhere else: a UTF-8 file, or over a socket, or into a database?
> Does this have issues that we wouldn't face if we just used bytes?
>

This is my worry as well and why I have not proposed this kind of universal
normalizing of bytes paths using os.fsdecode() w/ surrogateescape. Doing
this sort of thing from the system boundary and documenting as such as PEP
383 proposed makes a bit more sense as the expectation is more controlled
and is a clear input boundary.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/c3c356aa/attachment.html>


More information about the Python-Dev mailing list