Behaviour of os.path.join

DL Neil PythonList at DancesWithMice.info
Tue May 26 15:56:54 EDT 2020


On 27/05/20 5:23 AM, BlindAnagram wrote:
> On 26/05/2020 16:59, Mats Wichmann wrote:
>> On 5/26/20 8:56 AM, BlindAnagram wrote:
>>> I came across an issue that I am wondering whether I should report as an
>>> issue.  If I have a directory, say:
>>>
>>>    base='C:\\Documents'
>>>
>>> and I use os.path.join() as follows:
>>>
>>>    join(base, '..\\..\\', 'build', '')
>>>
>>> I obtain as expected from the documentation:
>>>
>>> 'C:\\Documents\\..\\..\\build\\'
>>>
>>> But if I try to make the directory myself (as I tried first):
>>>
>>>    join(base, '..\\..\\', 'build', '\\')
>>>
>>> I obtain:
>>>
>>> 'C:\\'
>>>
>>> The documentation says that an absolute path in the parameter list for
>>> join will discard all previous parameters but '\\' is not an absoute path!
>>
>> But it is - an absolute path is one that starts with the pathname separator.
> 
> In a string giving a file path on Windows '\\' is recognised as a
> separator between directories and not as an indicator that what follows
> is an absolute path based on the drive letter (although it might, as you
> say, imply a drive context).

[some of this answer may appear 'obvious' to you. If so, please 
understand that this conversation has the side-benefit of assisting 
other readers to understand Python, and that I would not presume to 
'talk down' to you-personally]


Using the docs:

<<<
os.path.join(path, *paths)
Join one or more path components intelligently. The return value is the 
concatenation of path and any members of *paths with exactly one 
directory separator (os.sep) following each non-empty part except the 
last, meaning that the result will only end in a separator if the last 
part is empty. If a component is an absolute path, all previous 
components are thrown away and joining continues from the absolute path 
component.

On Windows... [previously discussed]
 >>>
https://docs.python.org/3/library/os.path.html

Let's start with the word "intelligently". Some might assume this to 
mean that it will distinguish between "separator between directories" 
and "absolute path". However, what it means is that it will select 
either the POSIX or the MS-Windows character(s) - depending upon whether 
the final-application is running on your machine or mine! It also means, 
that it expects to handle the assembly of the parameters into a single 
path (utilising the appropriate separator).

Please be advised that the pathlib library and pathlike interface were 
added quite recently, and largely because the os library is considered 
dated. Accordingly, please don't attempt to draw parallels or 'rules' by 
comparing the under-pinning philosophies of 'past' with 'future'.

Remember that Python does not define files, paths, directories 
(folders), and backing-store structures; and as observed, they differ 
between OpSys. The os and os.path libraries exist to help us (poor, 
long-suffering coders) to cope with the differences. Accordingly, in 
Python, we do not deal with the file system itself, but we code to an 
abstraction of a file system! Python's interpreter handles 'the real 
situation' at run-time. (thank you Python!)

Please review the os library 
(https://docs.python.org/3/library/os.html). There (amongst other very 
useful facilities) you will find such as os.sep (and various other 
os.*seps which illustrate how difficult it is to harmonise the 
abstraction to cope with the various realities). Note also, the warning 
(which applies both to 'construction' and 'separation' of paths from 
path-components).

Further reading? Because Python doesn't really define "path", let's turn 
to https://en.wikipedia.org/wiki/Path_%28computing%29 - but keep a 
headache remedy to-hand! This article provides such understandings as 
"path", "root", and "device" (the latter not existing in POSIX systems), 
per a range of operating systems.


OK, after all that, back to the question:-

Please examine the 'signature' of -join():

	os.path.join(path, *paths)

notice that the arguments are path[s] - NOT file-names, NOT directories 
(folders), and NOT path-components. Remember also the word "intelligent".

The objective of the function is to create a legal AND OpSys-appropriate 
path, by joining other *path(s)* together. Accordingly, the function 
considers each parameter to be a path. A path commencing with the symbol 
indicating the "root" is considered an "absolute path". A path 
commencing with a character (etc) is considered a "relative path". 
[Apologies, in that experienced pythonista will find this 'stating the 
obvious', but learners often do not find such differences, immediately 
apparent]

This may explain why the OP's use of, or interpretation of, arguments to 
the function, differs from that of the library.


Why a subsequent parameter, interpreted as an absolute-path, should 
cause all previous parameters to be 'thrown away' is an implementation 
detail - and I can't explain that choice, except to say that because 
some systems use the same character to represent the "root" directory as 
they do for the path-component separator, there are situations where the 
two could be confused - whether this happens on MS-Windows (or not) is 
besides the point when dealing with the Python file-system 'abstraction' 
functions!

IMHO: the best way to use -join() is not to mix its 'intelligence' with 
(OpSys-specific) string-literal separator characters of my own.
(even though I am (so much) smarter than it. Hah!)


>> The concept of paths is ugly in Windows because of the drive letter - a
>> drive letter is not actually part of a path, it's an additional piece of
>> context.  If you leave out the drive letter, your path is relative or
>> absolute within the current drive letter; if you include it your path is
>> relative or absolute within the specified drive letter.  So Python has
>> behaved as documented here: the indicator for an absolute path has
>> discarded everything (except the drive letter, which is necessary to
>> maintain the context you provided) which came before it in the join.
> 
> This is not consistent with how other file management functions in
> os.path operate since they willingly accept '\\' as a directory separator.

Back to the idea of an 'abstraction'. Please realise that sometimes 
libraries offer 'helper functions' or seek to be "accepting"/forgiving 
in accepting argument-data. However, this does not (necessarily) imply a 
"rule". Another "implementation detail"?
(this time in your favor/to your liking)


>> If indeed you're seeking a path that is terminated by the separator
>> character, you need to do what you did in the first example - join an
>> empty string at the end (this is documented).  The terminating separator
>> _usually_ isn't needed.  Sadly, sometimes it appears to be...

Another 'implementation detail' which copes with 'edge cases'. This one 
has caught me too!
[so, not that 'intelligent' after all? (joke)]
-- 
Regards =dn


More information about the Python-list mailing list