pathlib

Mon Sep 30 10:21:42 EDT 2019

On 9/30/19 8:40 AM, Barry Scott wrote:
  >
  >
  >> On 30 Sep 2019, at 12:51, Dan Sommers
<2QdxY4RzWzUUiLuE at potatochowder.com> wrote:
  >>
  >> On 9/30/19 4:28 AM, Barry Scott wrote:
  >>>> On 30 Sep 2019, at 05:40, DL Neil via Python-list
<python-list at python.org> wrote:
  >>>> Should pathlib reflect changes it has made to the file-system?
  >>> I think it should not.
  >>> A Path() is the name of a file it is not the file itself. Why 
should it
  >>> track changes in the file system for the name?
  >>
  >> I would have said the same thing, but the docs⁰ disagree:  a
  >> PurePath represents the name of (or the path to) a file, but a
  >> Path represents the actual file.
  >
  > I'm not seeing that wording in the python 3.7 pathlib documentation.
  > Can you quote the exact wording please?
  >
  > I do see this:
  >
  > "Pure path objects provide path-handling operations which don’t
actually access a filesystem."
  >
  > And:
  >
  > "Concrete paths are subclasses of the pure path classes. In addition
to operations provided
  > by the latter, they also provide methods to do system calls on path
objects."

That's the wording I read.  I inferred that "path-handling operations
which don't actually access a filesystem" meant an object that didn't
necessarily represent an actual file, and that "provide methods to do
system calls on path objects" did indicate an actual file.  From the
existence of Path.read_bytes, I inferred that at least some Path objects
represent (and operate on) actual existing files.  I've been doing this
for a long time, and I may have read my expecations into those words.

  > There is no requirement that a Path() names a file that exists even.

Agreed.

  >> That said, why doesn't your argument apply to read and write?  I
  >> would certainly expect that writing to a path and then reading
  >> from that same path would return the newly written data.  If I
  >> squint funny, the Path object is tracking the operations on the
  >> file system.
  >
  > I do not expect that. Consider the time line:
  >
  > 1. with p.open('w') write data
  > 2. external process changes file on disk
  > 3. with p.open('r') read data
  >
  > How would (3) get the data written at (1) guaranteed?
  > It will lead to bugs to assume that.

I didn't say anything about a guarantee, or about an external processes.
If I have a single process that writes data to a file and then reads
from that file, I would expect to read what I just wrote.  See the
documentation of Path.read_bytes and Path.write_bytes.  If I throw an
external process, or a networked file system, or multiple layers of
buffering and/or caching into the mix, then all such bets are off.

I think you're making my point about expectations.  :-)

  > The path object is allowing system calls that need a file's path to
be called,
  > that is all. Beyond that there is no relationship between the
pathlib.Path()
  > objects and files.

The documentation of Path.read_bytes and Path.write_bytes say otherwise.

  >> I think I'm actually arguing against some long since made (and
  >> forgotten?) design decisions that can't be changed (dare I say
  >> fixed?) because changing them would break backwards
  >> compatibility.
  >>
  >> Yuck.  :-)  And I can absolutely see all sorts of different
  >> expecations not being met and having to be explained by saying
  >> "well, that's the way it works."
  >
  > I'd suggest that the design is reasonable and If there is
misunderstanding that its
  > something that docs could address.

I'm not disagreeing.  I suspect that we've both worked on enough
different systems to know that not all OSes, file systems, libraries,
and versions and combinations thereof work the same way under all
circumstances (multiple threads, multiple processes, caching, buffering,
etc.).  It's the epitome of YMMV.

Rename is a particularly thorny case because renaming a file, at least
on a POSIX system, is an operation on the directory containing the file
rather than the file itself.