[Python-Dev] casefolding in pathlib (PEP 428)

Antoine Pitrou solipsis at pitrou.net
Fri Apr 12 16:59:50 CEST 2013


Le Fri, 12 Apr 2013 14:43:42 +0200,
Ronald Oussoren <ronaldoussoren at mac.com> a écrit :
> 
> On 12 Apr, 2013, at 10:39, Antoine Pitrou <solipsis at pitrou.net> wrote:
> >> 
> >> 
> >> Perhaps it would be best if the code never called lower() or
> >> upper() (not even indirectly via os.path.normcase()). Then any
> >> case-folding and path-normalization bugs are the responsibility of
> >> the application, and we won't have to worry about how to fix the
> >> stdlib without breaking backwards compatibility if we ever figure
> >> out how to fix this (which I somehow doubt we ever will anyway :-).
> > 
> > Ok, I've taken a look at the code. Right now lower() is used for two
> > purposes:
> > 
> > 1. comparisons (__eq__ and __ne__)
> > 2. globbing and matching
> > 
> > While (1) could be dropped, for (2) I think we want glob("*.py") to
> > find "SETUP.PY" under Windows. Anything else will probably be
> > surprising to users of that platform.
> 
> Globbing necessarily accesses the filesystem and could in theory do
> the right thing, except for the minor detail of there not being an
> easy way to determine of the names in a particular folder are
> compared case sensitive or not. 

It's also much less efficient, since you have to stat() every potential
match. e.g. when encountering "SETUP.PY", you would have to stat() (or,
rather, lstat()) both "setup.py" and "SETUP.PY" to check if they have
the same st_ino.

> At least for OSX the kernel will normalize names for you, at least
> for HFS+, and therefore two names that don't compare equal with '=='
> can refer to the same file (for example the NFKD and NFKC forms of
> Löwe). 

I don't think differently normalized filenames are as common on OS X as
differently cased filenames are on Windows, right?

Regards

Antoine.




More information about the Python-Dev mailing list