Why exception from os.path.exists()?

Thu Jun 7 23:12:32 EDT 2018

Richard Damon <Richard at Damon-Family.org> writes:

> This does bring up an interesting point. Since the Unix file system
> really has file names that are collection of bytes instead of really
> being strings, and the Python API to it want to treat them as strings,
> then we have an issue that we are going to be stuck with problems with
> filenames.

I agree with the general statement “we are going to be stuck with
problems with filenames”; the world of filesystems is messy, which will
always cause problems.

With that said, I don't agree that “the Python API wants to treat
[file paths] as strings”. The ‘os’ module explicitly promises to treat
bytes as bytes, and text as text, in filesystem paths:

    Note: All of these functions accept either only bytes or only string
    objects as their parameters. The result is an object of the same
    type, if a path or file name is returned.

    <URL:https://docs.python.org/3/library/os.path.html>

There is a *preference* for text, it's true. The opening paragraph
includes this:

    Applications are encouraged to represent file names as (Unicode)
    character strings.

That is immediately followed by more specific advice that says when to
use bytes:

    Unfortunately, some file names may not be representable as strings
    on Unix, so applications that need to support arbitrary file names
    on Unix should use bytes objects to represent path names. Vice
    versa, using bytes objects cannot represent all file names on
    Windows (in the standard mbcs encoding), hence Windows applications
    should use string objects to access all files.

(That needs IMO a correction, because as already explored in this
thread, it's not Unix or Windows that makes the distinction there. It's
the specific *filesystem type* which records either bytes or text, and
that is true no matter what operating system happens to be reading the
filesystem.)

> Ultimately we have a fundamental limitation with trying to abstract out
> the format of filenames in the API, and we need a back door to allow us
> to define what encoding to use for filenames (and be able to detect that
> it doesn't work for a given file, and change it on the fly to try
> again), or we need an alternate API that lets us pass raw bytes as file
> names and the program needs to know how to handle the raw filename for
> that particular file system.

Yes, I agree that there is an unresolved problem to explicitly declare
the encoding for filesystem paths on ext4 and other filesystems where
byte strings are used for filesystem paths.

-- 
 \       “Give a man a fish, and you'll feed him for a day; give him a |
  `\    religion, and he'll starve to death while praying for a fish.” |
_o__)                                                       —Anonymous |
Ben Finney