[Python-Dev] casefolding in pathlib (PEP 428)

Ronald Oussoren ronaldoussoren at mac.com
Fri Apr 12 14:43:42 CEST 2013


On 12 Apr, 2013, at 10:39, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> 
>> 
>> Perhaps it would be best if the code never called lower() or upper()
>> (not even indirectly via os.path.normcase()). Then any case-folding
>> and path-normalization bugs are the responsibility of the application,
>> and we won't have to worry about how to fix the stdlib without
>> breaking backwards compatibility if we ever figure out how to fix this
>> (which I somehow doubt we ever will anyway :-).
> 
> Ok, I've taken a look at the code. Right now lower() is used for two
> purposes:
> 
> 1. comparisons (__eq__ and __ne__)
> 2. globbing and matching
> 
> While (1) could be dropped, for (2) I think we want glob("*.py") to find
> "SETUP.PY" under Windows. Anything else will probably be surprising to
> users of that platform.

Globbing necessarily accesses the filesystem and could in theory do the
right thing, except for the minor detail of there not being an easy way
to determine of the names in a particular folder are compared case sensitive
or not. 

> 
>> - On Linux, paths are really bytes; on Windows (at least NTFS), they
>> are really (16-bit) Unicode; on Mac, they are UTF-8 in a specific
>> normal form (except on some external filesystems).
> 
> pathlib is just relying on Python 3's sane handling of unicode paths
> (thanks to PEP 383). Bytes paths are never used internally.

At least for OSX the kernel will normalize names for you, at least for HFS+,
and therefore two names that don't compare equal with '==' can refer to the
same file (for example the NFKD and NFKC forms of Löwe). 

Isn't unicode fun :-)

Ronald

> 
>> - On Windows, short names are still supported, making the number of
>> ways to spell the path for any given file even larger.
> 
> They are still supported but I doubt they are still relied on (long
> filenames appeared in Windows 95!). I think in common situations we can
> ignore their existence. Specialized tools like Mercurial may have to
> know that they exist, in order to manage potential collisions (but
> Mercurial isn't really the target audience for pathlib, and I don't
> think they would be interested in such an abstraction).
> 
> Regards
> 
> Antoine.
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com



More information about the Python-Dev mailing list