[Python-Dev] pathlib - current status of discussions

Nick Coghlan ncoghlan at gmail.com
Fri Apr 15 03:11:35 EDT 2016


On 15 April 2016 at 00:01, Random832 <random832 at fastmail.com> wrote:
> On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:
>> Adding integers and floats is considered "safe" because most people's
>> use of floats completely compasses their use of ints. (You'll get
>> OverflowError if it can't be represented.) But float and Decimal are
>> considered "unsafe":
>>
>> >>> 1.5 + decimal.Decimal("1.5")
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: unsupported operand type(s) for +: 'float' and
>> 'decimal.Decimal'
>>
>> This is more what's happening here. Floats and Decimals can represent
>> similar sorts of things, but with enough incompatibilities that you
>> can't simply merge them.
>
> And what such incompatibilities exist between bytes and str for the
> purpose of representing file paths? At the end of the day, there's
> exactly one answer to "what file on disk this represents (or would
> represent if it existed)".

Bytes paths on WIndows are encoded as mbcs for use with the ASCII-only
Windows APIs, and hence don't support the full range of characters
that str does. The colloquial shorthand for that is "bytes paths don't
work properly on Windows" (the more strictly accurate description is
"bytes paths only work correctly on Windows if every code point in the
path can be encoded using the 'mbcs' codec").

Even on *nix, os.fsencode may fail outright if the system is
configured to use a non-universal encoding, while os.fsdecode may
pollute the resulting string with surrogate escaped characters.

Regardless of platform, if somebody hands you *mixed* bytes and str
data, the appropriate default reaction is to complain about it rather
than assume they meant one or the other. That complaint may take one
of two forms:

- for a high level, platform independent API, bytes should just be
rejected outright
- for a low level API with input type dependent behaviour, the input
should be rejected as ambiguous - the API doesn't know whether the str
behaviour or the bytes behaviour is the intended one

pathlib falls into the first category - it just rejects bytes as input
os.path.join falls into the second category - all str is fine, and all
bytes is fine, but mixing them fails

However, once somebody reaches for the coercion APIs (fsdecode and
fsencode), they're now *explicitly* telling the interpreter what they
want, since there's no ambiguity about the possible return types from
those functions.

In relation to Victor's comment about this being complex code to show
to a novice:

  os.path.join(*map(os.fsdecode, ("str", b"bytes")))

I agree, but also think that's a good reason for people to switch to
teaching novices pathlib rather than os.path, and letting them
discover the underlying libraries as required by the code and examples
they encounter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list