[Python-Dev] pathlib - current status of discussions

Nick Coghlan ncoghlan at gmail.com
Tue Apr 12 04:56:44 EDT 2016


On 12 April 2016 at 15:28, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Donald Stufft writes:
>
>  > I think yes and yes [__fspath__ and fspath should be allowed to
>  > handle bytes, otherwise] it seems like making it needlessly harder
>  > to deal with a bytes path
>
> It's not needless.  This kind of polymorphism makes it hard to review
> code locally.  Once bytes get a foothold inside a text application,
> they metastasize altogether too easily, and you end up with TypeErrors
> or UnicodeErrors quite far from the origin.  Debugging often requires
> tracing data flows over hill and over dale while choking from the
> dusty trail, or band-aids like a top-level "except UnicodeError:
> log_and_quarantine(bytes)".  I can't prove that returning bytes from
> these APIs is a big risk in this sense, but I can't see a way to prove
> that it's not, either, given that their point is duck-typing, and
> therefore they may be generalized in the future, and by third parties.
>
> I understand that there are applications where it's bytes all the way
> down, but by the very nature of computing systems, there are systems
> where bytes are decoded to text.  For historical reasons (the encoding
> Tower of Babel), it's very error-prone to do that on demand.  Best
> practice is to do the conversion as close to the boundary as possible,
> and process only text internally.

One possible way to address this concern would be to have the
underlying protocol be bytes/str (since boundary code frequently needs
to handle the paths-are-bytes assumption in POSIX), but offer an
"os.fspathname" API that rejected bytes output from os.fspath. That
is, it would be equivalent to:

    def fspathname(path):
        name = os.fspath(path)
        if not isinstance(name, str):
            raise TypeError("Expected str for pathname, not
{}".format(type(name)))
        return name

That way folks that wanted the clean "must be str" signature could use
os.fspathname, while those that wanted to accept either could use the
lower level os.fspath.

The ambiguity in question here is inherent in the differences between
the way POSIX and Windows work, so there are limits to how far we can
go in hiding it without making things worse rather than better.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list