pathlib

Tue Oct 1 01:03:39 EDT 2019

On 30/09/19 9:28 PM, Barry Scott wrote:
>> On 30 Sep 2019, at 05:40, DL Neil via Python-list <python-list at python.org> wrote:
>>
>> Should pathlib reflect changes it has made to the file-system?
> 
> I think it should not.

The term "concrete" is applied to Path(), PosixPath(), and WindowsPath() 
- whereas the others are differentiated with the prefix "Pure".

I take "concrete" to mean 'existing in reality or real experience'. 
Thus, I saw the Pure* entities as supporting abstract paths, but the 
concrete entities as representing (and reflecting) real-world (file 
system) entities.

Thus, there is no need for .exists() to be available in the Pure paths, 
but there is when utilising their concrete implementations.

NB .name however is inherited from PurePath()!

> A Path() is the name of a file it is not the file itself. Why should it
> track changes in the file system for the name?

BUT... Path() does keep track of changes in the file system for other 
attributes! So, why not also name?

(remember that Python (in our code examples) does not know the file by 
its file-name (before OR after rename) but by the instance name, eg 
"save_file", below)

> Here is code to show why following the rename will break logic:
> 
> save_file = pathlib.Path('my-data.txt')
> 
> def save( data ):
> 	# backup file
> 	if save_file.exists():
> 		save_file.rename('my-data.backup')
> 
> 	# save data
> 	with save_file.open() as f:
> 		f.write( data )
> 
> while True:
> 	save( generate_data() )
> 	time.sleep( interval )
> 
> If the rename of the file changed the path the above code will fail.

That is one use-case, but in the use-case which led to the original 
post, the logic was:

iterate directory-entries,
	if fileNM follows an out-of-date naming-convention,
		rename the file,
		then process further
(logging observations along the way).

Here's a code-snippet illustrating both of the above points:

import pathlib
p = pathlib.Path( "data-file" )
p
# PosixPath('data-file')
p.stat()
os.stat_result(... st_mtime=1569898467, st_ctime=1569898467)
# ... = excised results, leaving two data-points worth observing

with p.open("w") as f:
     f.write("new stuff")
# 9

p.stat()
os.stat_result(... st_mtime=1569898572, st_ctime=1569898572)
# hey look, this reflects REAL and CHANGED data from the FS

# using input logic, cf previous example's output logic
p.rename( "new-name" )
with p.open( "r" ) as f: f.readline()
...
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib64/python3.7/pathlib.py", line 1193, in open
     opener=self._opener)
   File "/usr/lib64/python3.7/pathlib.py", line 1046, in _opener
     return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'data-file'

# hence we cannot follow the use-case

# why? Because .name == 'data-file' but the real-world file is (now) 
called "new-name".

NB I would not attempt to suggest that the logic of the 'write' use-case 
is any more, or any less, valid that that of the 'read' use-case; and 
I'm not arguing with you personally!

Looking at the PEP, it didn't alleviate my confusion because:
(a) <<<
Why an object-oriented API
... form of filesystem handling abstraction
 >>>
(b) <<<
Immutability
Path objects are immutable, which makes them hashable and also prevents 
a class of programming errors.
 >>>
(c) <<<
Concrete paths API
In addition to the operations of the pure API, concrete paths provide 
additional methods which actually access the filesystem to query or 
mutate information.
 >>>

I liked the OOP aims in point (a) but it clearly says "filesystem".

Whereas the logic (mentioned above) of the inherent 'safety' of 
immutable objects is inescapable - point (b), that is incompatible with 
the usage and behavior of a file system. [which is a point made (by 
others), elsewhere)]

Point (c) appears to suggest (as written above) that whereas the Pure 
API can be immutable and separate from any physical file system, the 
Concrete paths will perform actions and thus mutate with the real FS.

Further to my (personal) point about use-cases and debate, if we follow 
the 'recommended' code snippet in the (PSL) manual, the logic is that 
there should be separate paths/Paths (whether for the two (physical) 
files or not) - their close relationship notwithstanding:

 >>> p = Path('foo')
 >>> p.open('w').write('some text')
9
 >>> target = Path('bar')
 >>> p.rename(target)
 >>> target.open().read()
'some text'

NB utilising two Path instances (cf allowing a string to 'intrude') will 
enrich both of the use-cases we have illustrated!

Also, in subscribing to the rationale that it is better to represent 
paths/file-names with a semantic object, then the above makes good 
sense. (historically there has been debate about whether paths are a 
sub-class of string (a path is a string, but a string may not be a class 
- indeed a string intended to be a path may not be legal within a file 
system) or if paths are collections (which can be seen in PurePath.parts 
and the special operator). I think this point surfaces in a later 
contribution to this thread.

Rule for self: don't mix Path() classes with non-semantic string 
representations!

Sadly, help( ...Path ) does not match the (fine) manual's 
explanation/illustration, so perhaps RTFM applies...

-- 
Regards =dn