pathlib

Tue Oct 1 02:08:57 EDT 2019

On 1/10/19 3:21 AM, Dan Sommers wrote:
> On 9/30/19 8:40 AM, Barry Scott wrote:
>   >> On 30 Sep 2019, at 12:51, Dan Sommers
> <2QdxY4RzWzUUiLuE at potatochowder.com> wrote:
>   >> On 9/30/19 4:28 AM, Barry Scott wrote:
>   >>>> On 30 Sep 2019, at 05:40, DL Neil via Python-list
> <python-list at python.org> wrote:
>   >>>> Should pathlib reflect changes it has made to the file-system?
>   >>> I think it should not.
>   >>> A Path() is the name of a file it is not the file itself. Why 
> should it
>   >>> track changes in the file system for the name?
>   >>
>   >> I would have said the same thing, but the docs⁰ disagree:  a
>   >> PurePath represents the name of (or the path to) a file, but a
>   >> Path represents the actual file.
>   >
>   > I'm not seeing that wording in the python 3.7 pathlib documentation.
>   > Can you quote the exact wording please?
>   >
>   > I do see this:
>   >
>   > "Pure path objects provide path-handling operations which don’t
> actually access a filesystem."
>   >
>   > And:
>   >
>   > "Concrete paths are subclasses of the pure path classes. In addition
> to operations provided
>   > by the latter, they also provide methods to do system calls on path
> objects."
> 
> That's the wording I read.  I inferred that "path-handling operations
> which don't actually access a filesystem" meant an object that didn't
> necessarily represent an actual file, and that "provide methods to do
> system calls on path objects" did indicate an actual file.  From the
> existence of Path.read_bytes, I inferred that at least some Path objects
> represent (and operate on) actual existing files.  I've been doing this
> for a long time, and I may have read my expecations into those words.

+1 "Pure" cf "concrete".

The mixture makes it difficult to insist that a Path does not represent 
a file if (some) operations are included.

>   > There is no requirement that a Path() names a file that exists even.
> Agreed.
> 
>   >> That said, why doesn't your argument apply to read and write?  I
>   >> would certainly expect that writing to a path and then reading
>   >> from that same path would return the newly written data.  If I
>   >> squint funny, the Path object is tracking the operations on the
>   >> file system.
>   >
>   > I do not expect that. Consider the time line:
>   >
>   > 1. with p.open('w') write data
>   > 2. external process changes file on disk
>   > 3. with p.open('r') read data
>   >
>   > How would (3) get the data written at (1) guaranteed?
>   > It will lead to bugs to assume that.
> 
> I didn't say anything about a guarantee, or about an external processes.
> If I have a single process that writes data to a file and then reads
> from that file, I would expect to read what I just wrote.  See the
> documentation of Path.read_bytes and Path.write_bytes.  If I throw an
> external process, or a networked file system, or multiple layers of
> buffering and/or caching into the mix, then all such bets are off.
> 
> I think you're making my point about expectations.  :-)

+1

>   > The path object is allowing system calls that need a file's path to
> be called,
>   > that is all. Beyond that there is no relationship between the
> pathlib.Path()
>   > objects and files.
> 
> The documentation of Path.read_bytes and Path.write_bytes say otherwise.

+1

>   >> I think I'm actually arguing against some long since made (and
>   >> forgotten?) design decisions that can't be changed (dare I say
>   >> fixed?) because changing them would break backwards
>   >> compatibility.
>   >>
>   >> Yuck.  :-)  And I can absolutely see all sorts of different
>   >> expecations not being met and having to be explained by saying
>   >> "well, that's the way it works."
>   >
>   > I'd suggest that the design is reasonable and If there is
> misunderstanding that its
>   > something that docs could address.
> 
> I'm not disagreeing.  I suspect that we've both worked on enough
> different systems to know that not all OSes, file systems, libraries,
> and versions and combinations thereof work the same way under all
> circumstances (multiple threads, multiple processes, caching, buffering,
> etc.).  It's the epitome of YMMV.
> 
> Rename is a particularly thorny case because renaming a file, at least
> on a POSIX system, is an operation on the directory containing the file
> rather than the file itself.

Thank you @Dan for keeping the conversation going during my night-hours.
-- 
Regards =dn