[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Stephen J. Turnbull stephen at xemacs.org
Mon Apr 18 15:26:56 EDT 2016


Brett Cannon writes:

 > If we continue with the "str is an encoding of file paths",

It's not.  It's a representation, but not an encoding.  In Python 3,
encoding means a representation of a character string using bytes.
It's using "encoding" generically for "representation" that makes your
head hurt.

 > you can then build from "bytes is an encoding of str" to get a
 > pyramid of file path encodings: Path -> str -> bytes. I don't think
 > this is in any way a controversial view.

Perhaps not.  But it's not particularly useful. ;-)  Here's the
pyramid I think about:

                                 Path
                                /    \
                               /      \
                              V        V
                            str <-> bytes

That is, str and bytes are interchangeable *without* any knowledge of
paths, which are on a higher level of complexity and abstraction.
Although in pathlib, there's an assumption that paths are serialized
to str which is (implicitly) serialized to bytes when talking to the
OS, this is not necessarily true for other structured path classes, in
particular it is not true for DirEntry (which is a "enhanced
degenerate" path containing only one path segment but also other
useful information abot the filesystem object addressed)

I haven't looked at Antipathy, but I would guess from Ethan's
promotion of bytes paths and concern with efficiency that "bytes
antipaths" do *not* "go through" str to get to bytes, they already are
bytes (in the sense of class inheritance).

 > But that's when I realized that adding __fspath__ support to os.fsdecode()
 > and os.fsencode(), they become more coercion functions rather than
 > encoding/decoding functions. It also means that os.fspath() has a place
 > when you want to say "I only want to encode a file path to str" and avoid
 > the decode bit that os.fsdecode() would do

I don't understand what you're trying to say here.  fsdecode currently
does not promise to decode anything, because it's polymorphic,
accepting str and bytes.  fsdecode and fsencode already *are* coercion
functions.

It's this kind of semantic confusion and broken nomenclature that is
*why* I dislike these polymorphic functions and objects so much.  It
is impossible to reason correctly about them.  We're stuck with
invoking "practicality" and muddling through.  And the names mislead
even experienced Pythonistas.

Steve



More information about the Python-Dev mailing list