[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Ethan Furman ethan at stoneleaf.us
Wed Apr 13 11:09:36 EDT 2016


On 04/13/2016 07:21 AM, Nick Coghlan wrote:
> On 14 April 2016 at 00:11, Paul Moore wrote:
>> On 13 April 2016 at 14:51, Nick Coghlan wrote:

>>> The potential SE-strings only come back when you pass str, and the
>>> operating system data isn't properly encoded according to the nominal
>>> filesystem encoding. They round trip nicely to other operating system
>>> APIs, but can indeed be a problem if they escape to other parts of
>>> your program
>>
>> If the operating system APIs handle SE-strings correctly, is it not
>> acceptable to require the fspath protocol to return strings, and then
>> places like DirEntry or Ethan's module, when they want to return
>> bytes, can just SE-encode the bytes and return those?
>>
>> Or will the fspath protocol be used at a low enough level that it's
>> *below* the point where SE-encoded strings are handled properly?
>
> I'd expect the main consumers to be os and os.path, and would honestly
> be surprised if we needed many explicit invocations above that layer,
> other than in pathlib itself.
>
> That's actually the main factor in my suggesting the two level API
> design - from a protocol consumer perspective, bytes-or-str is a
> natural fit for os and os.path, while str-only is a natural fit for
> pathlib.
>
> I also now believe it makes sense to postpone a final decision on this
> aspect of the design until after a draft implementation has been put
> together, as my and Ethan's assumption that os and os.path will be the
> main consumers is exactly that: an assumption. Putting the draft
> implementation together will let us know whether or not it's an
> accurate one.

Sounds reasonable.

However, there is still one choice that needs to be made:

- a single os.fspath() with an allow_bytes parameter
   (mostly True in os and os.path, mostly False everywhere
   else)

- a str-only os.fspathname() and a str/bytes os.fspath()

I'm partial to the first choice as it is simplicity itself to know when 
looking at it if bytes might be coming back by the presence or absence 
of a second argument to the call; otherwise one has to keep straight in 
one's head which is str-only and which might allow bytes (I'm not very 
good at keeping similar sounding functions separate -- what's the 
difference between shutil.copy and shutil.copy2?  I have to look it up 
every time).

--
~Ethan~


More information about the Python-Dev mailing list