[Python-Dev] file system path protocol PEP
Stephen J. Turnbull
stephen at xemacs.org
Sat May 14 02:56:30 EDT 2016
Chris Angelico writes:
> AFAICT, the compatibility layer would simply decode the bytes using
> surrogateescape handling, which should round-trip anything.
By design. See PEP 383. Or rather, the OP should; he has not done
his homework and is confused by his own FUD. This whole subthread is
really python-list territory.
Since a lot of people I respect seem uncertain about the facts, for
the record, let's lay out the (putative) issues remaining for
post-PEP-383 Python vs. str-y path objects.
(0) "Can't work with some POSIX (bytes) paths" is closed by PEP
383, forget it. os.fsdecode(bytespath) as soon as you get one,
os.fsencode(strpath) just before you need one, done. Surrogates
embedded in strpath may need special handling depending on the
application (see (1)).
(1) str.encode(errors='strict') (the default) will blow up on embedded
surrogates. Yes, but that's a *good* thing if you're mixing str
derived from filesystem paths with other text. There's no way to
avoid it. If you're just passing it back to open(), it Just
Works, done.
(2) You're using bytes as text a la 2.x for "efficiency's" sake, and
you're worried that you might pass a str-y Path deep into bytes
territory and it will explode there.
I don't think there is any sympathy left for that use case on
Python dev channels. Define a clear boundary with well-defined
entry and exit gates, and convert there. Then you can get some
sleep. (How-to example: your "compatibility layer".)
(3) You're worried about inefficiency of decoding/encoding to the same
or trivially changed bytes (ie, you didn't need pathlib in the
first place, but you got it anyway) -- this especially matters for
2.7, but is significant for 3.x too, if you're using a bunch of
paths in a tight loop.
I don't have sympathy for that use case, but Brett and Guido do,
and Brett's PEP handles it by making __fspath__ polymorphic in the
usual os.path-y way, with Guido's modification.
This is always a tradeoff. If you know your JPEGs all have
extension '.JPG' and
png_path = jpeg_path[:-4] + b'.png'
is readable enough for you, use that, not pathlib or Antipathy,
and you get your efficiency. (Doing jpeg_path.rindex(b'.') is left
as an exercise for the reader. Part (i): Is it really worth it?)
If you want the readability of a rich path library and the
efficiency of bytes, you *may* have the option of using Ethan's
Antipathy (or whatever).
If you can't use Antipathy, use bytes methods directly, or accept
that it isn't *that* inefficient and use pathlib. At this point,
I think this subcase is just FUD, no real examples were presented
where the efficiency hit of encoding/decoding gets in the way of
getting work done using pathlib.
If you need to stick to stdlib for some reason (eg, to use a
higher-level library that uses pathlib), live with the
"compatibility layer"'s inefficiency. Decoding and encoding are
actually rather low-cost operations at path lengths (PATHMAX=256
was common, not so long ago!). Most high-level libraries will
impose a lot more overhead elsewhere, and calling into pathlib by
itself will add a certain amount of overhead as well.
(4) Lack of transparency/readability for "simple" operations. If
Antipathy is something you can use, I agree it's plausible that
avoiding a few os.fsdecode and os.fsencode calls would look nicer,
but this is really a style question.
My take: I think of paths as human-readable, so presenting them as
str (not bytes) is important to me, important enough that I
advocate that position to other developers. If you do the
conversion at the boundary between a bytes-y module and pathlib
("compatibility layer") I don't see how it affects readability of
the path manipulation code, while data marshaling at boundaries is
a expected fact of software development. YMMV.
(0) is thus a non-issue. (1) is not something that can be addressed
by general principles, let alone language design. (2)-(4) are all
real issues regardless of how I feel they should be resolved :-), but
they're all design trade-offs, not things that can completely block
you from getting some kinds of work done in your own style (eg, the
situation str-minded people were in before PEP 383).
Python 3 is an example of how language design can help alleviate
issues like (2), by discouraging that use case in various ways.
Brett's PEP is an example of how language design can help alleviate
issues like (3) and (4). In particular, it helps us to interface
pathlib to open() and friends in a very natural, readable way, without
explicit conversions that should be unnecessary by the nature of the
operation and its arguments. By contrast, the conversion of bytes to
str is important to do explicitly because they are different
representations of the same thing, and it's important that readers be
notified of that change of representation.
> Or am I wrong here somewhere?
Well, considering the length of this irrelevant-to-the-PEP subthread,
arguably you are feeding a successful troll. I hope that having
posted the above, in the future there will be *one*, *short* reply to
such questions:
Not a problem. Read PEP 383.
and the thread will end there.
Steve
More information about the Python-Dev
mailing list