[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Ethan Furman ethan at stoneleaf.us
Sat Apr 9 12:41:01 EDT 2016


On 04/09/2016 12:48 AM, Nick Coghlan wrote:

 > Considering the helper function usage, here's some examples in
 > combination with os.fsencode and os.fsdecode:
 >
 >   # Status quo for binary/text path conversions
 >   text_path = os.fsdecode(bytes_path)
 >   bytes_path = os.fsencode(text_path)
 >
 >   # Getting a text path from an arbitrary object
 >   text_path = os.fspath(obj) # This doesn't scream "returns text!"
 >   text_path = os.fspathname(obj) # This does
 >
 >   # Getting a binary path from an arbitrary object
 >   bytes_path = os.fsencode(os.fspath(obj))
 >   bytes_path = os.fsencode(os.fspathname(obj))
 >
 > I'm starting to think the semantic nudge from the "name" suffix when
 > reading the code is worth the extra four characters when writing it
 > (keeping in mind that the whole point of this exercise is that most
 > folks *won't* be writing explicit conversions - the stdlib will handle
 > it on their behalf).
 >
 > I also think the more explicit name helps answer some of the type
 > signature questions that have arisen:
 >
 > 1. Does os.fspathname return rich Path objects? No, it returns names
 > as str objects
 > 2. Will file descriptors pass through os.fspathname? No, as they're
 > not names, they're numeric descriptors.
 > 3. Will bytes-like objects pass through os.fspathname? No, as they're
 > not names, they're encodings of names

This worries me.

I know the primary purpose of this change is to enable pathlib and os 
and the rest of the stdlib to work together, but consider . . .

If adding a new attribute/method was as far as we went, new code (stdlib 
or otherwise) would look like:

   if isinstance(a_path_thingy, bytes):
       # because os can accept bytes
       pass
   elif isinstance(a_path_thingy, str):
       # but it's usually text
       pass
   elif hasattr(a_path_thingy, '__fspath__'):
       a_path_thingy = a_path_thingy.__fspath__()
   else:
       raise TypeError('not a valid path')
   # do something with the path

If we add os.fspath(), but don't allow bytes to be returned from it, our 
above example looks more like:

   if isinstance(a_path_thingy, bytes):
       # because os can accept bytes
       pass
   else:
       a_path_thingy = os.fspath(a_path_thingy)
   # do something with the path

Yes, it's better -- but it still requires a pre-check before calling 
os.fspath().

It is my contention that this is better:

   a_path_thingy = os.fspath(a_path_thingy)

This raises two issues:

1) Part of the stdlib is the new scandir module, which can work
    with, and return, both bytes and text -- if __fspath__ can only
    hold text, DirEntry will not get the __fspath__ method added,
    and the pre-check, boiler-plate code will flourish;

2) pathlib.Path accepts bytes -- so what happens when a byte-derived
    Path is passed to os.fspath()?  Is a TypeError raised?  Do we
    guess and auto-convert with fsdecode()?

I think the best answer is to

- let __fspath__ hold bytes as well as text
- let fspath() return bytes as well as text

--
~Ethan~


More information about the Python-Dev mailing list