[Python-ideas] Add OS-dependent automatic glob support

Steven D'Aprano steve at pearwood.info
Mon Jan 5 01:57:30 CET 2015


On Sun, Jan 04, 2015 at 06:10:01PM -0500, random832 at fastmail.us wrote:
[...]
> Right now, none of this is done. If you pass *.txt on the command line
> to a python script, it will attempt to open a file called "*.txt".

The important thing is that shells have different behaviour. A way, 
perhaps not the best way, but a way, of getting close to platform 
independent behaviour when it comes to globbing is to use the glob 
module.

On Windows, you will get Python's definition of globbing. On POSIX 
systems, the globbing module will do nothing, because the shell will 
most likely have already expanded the wild-cards. (I don't know of any 
Unix shells which behave like Windows.) Windows users are unlikely to 
try to use Unix-shell-specific wildcards, because they are Windows users 
and they won't have any expectation that they will work. Unix users 
will, and they will work because the shell interprets the wildcards 
before Python sees them.


> Another separate but related issue is the fact that windows wildcards do
> not behave in the same way as python glob patterns.

I don't understand why you say this. As I understand it, there is no 
such thing as "Windows wildcards" as every application which wants to 
support wildcards has to implement their own. If you want to know what 
kinds of globbing wildcards the application supports, you have to read 
the application documentation. (Or guess.)

I am not a Windows expert, so I may have been misinformed. Anyone care 
to comment?


> Bracketed character classes are not supported, 

Applications which use the Python glob module do support them. Other 
applications may support them, if they so choose to implement it.


> and left bracket is a valid character in
> filenames unlike ? and *. 

And on POSIX systems, *all* wildcards are valid characters in file 
names. If you want to specify a file called literally "*.txt", or 
"spam[and eggs?].jpg", you have to escape the wildcards. Windows is no 
different.

The glob module supports escaping of wildcards, doesn't it? If so, we 
have no problem. If not, that's a bug, or at least an obvious and 
important piece of missing functionality.


> There are some subtleties around dots ("*.*"
> will match all filenames, even with no dot. "*." matches filenames
> without any dot.), they're case-insensitive (I think glob does handle
> this part, but not in the same way as the platform in some cases), and
> they can match the short-form alternate filenames [8 characters, dot, 3
> characters], so "*.htm" will typically match most files ending in
> ".html" as well as those ending in ".htm".

As I said, I don't believe that there is any standard Windows filename 
wildcard handling, so the behaviour you describe may apply to some 
applications but not all. Anyone like to comment?

 
> It might be useful to provide a way to make glob behave in the
> windows-specific way (using the platform-specific functions
> FindFirstFileEx and RtlIsNameInExpression on windows.)

This may be a good idea. Maybe glob needs to stay as it is, for 
backwards compatibility, and a new module osglob be created that aims to 
implement globbing according to the expected rules of the operating 
system. osglob could work like the os module and delegate to 
platform-specific modules posix_glob, windows_glob, osx_glob etc., with 
the current glob module remaining for applications which want to present 
a more-or-less equivalent globbing behaviour.


-- 
Steven


More information about the Python-ideas mailing list