[Python-ideas] Draft PEP: Automatic Globbing of Filenames in argparse on Windows

Kef Schecter furrykef at gmail.com
Fri Aug 14 11:32:53 CEST 2015


PEP: XXX
Title: Automatic Globbing of Filenames in argparse on Windows
Version: $Revision$
Last-Modified: $Date$
Author: Kef Schecter <furrykef at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 14-Aug-2015
Python-Version: 3.6
Post-History:


Abstract
========

This PEP proposes to add functionality to argparse to allow glob
(wildcard) expressions to be handled automagically on Windows.


Motivation
==========

For many command-line tools, it is handy to be able to specify
wildcards in order to operate on more than one file at a time.  On
Unix-like systems, this is handled automatically by the shell.  On
Windows, however, the default shell does not have this behavior, nor
does Microsoft's PowerShell.

Yet Windows users generally expect wildcards to work.  For example,
most built-in commands such as ``dir`` and ``type`` accept wildcard
arguments, and have since the early days of MS-DOS.

It is already possible for programmers to work around this issue, but
it is a bit cumbersome and it is easy to make the behavior almost,
but not quite, correct.  Moreover, since Python has a "batteries
included" philosophy, and this is a very common feature, it is the
author's opinion that the correct functionality should be available
out of the box.


How It Must Be Done Currently
=============================

::

    if platform.system() == 'Windows':
        filenames = []
        for filename in args.files:
            if '*' in filename or '?' in filename or '[' in filename:
                filenames += glob.glob(filename)
            else:
                filenames.append(filename)
        args.files = filenames


Why This Is a Problem
=====================

- Authors, especially those who use Unix-like systems, will usually
  not bother to add this code unless users specifically request it,
  and perhaps not even then.  How often have you seen this code in a
  program?

- It is easy to forget the platform check or not understand why it is
  necessary.  Automatically globbing filenames on a Unix-like system
  is wrong because the shell is supposed to handle it already; on such
  a system, if the program sees a name like ``*.txt``, then it means
  the user explicitly specified the name of a file that, improbable as
  it may seem, has an asterisk in its filename.  On Windows, filenames
  with the characters ``*`` and ``?`` in their name are not possible,
  so this is partially irrelevant on Windows even when using a
  Unix-like shell such as bash (but see `Square Brackets`_ below).

- It is easy to forget to check the string for wildcard characters
  before passing it to glob.glob.  If the user specifies a filename
  with no wildcards such as ``foo.txt``, and foo.txt does not exist,
  then glob.glob will silently ignore the file, giving the program no
  opportunity to print a message such as "No file named foo.txt".

- glob.glob may not be quite the right function to use.  See `Square
  Brackets`_ below.

- It is boilerplate code that is applicable to a large number of
  programs without change, which suggests it belongs in a library.


Solution
========

Add a keyword argument to argparse.ArgumentParser.add_argument called
``glob``.  If it is true, it will automatically glob filenames using
code much like the boilerplate code given earlier in `How It Must Be
Done Currently`_. This argument is only meaningful when nargs is set
to an appropriate value such as '+' or '*'.

The default value of this argument should be False.  This ensures
backward compatibility with existing programs that assume wildcards
are not expanded, such as a program that accepts a regex as an
argument.

A possibly better behavior might be to make this argument default to
True (enabling the functionality automagically without the programmer
needing to be aware of it) and only expand wildcard arguments that are
not provided in quotes, similar to how Unix-like shells behave.
However, there appears to be no simple way to tell whether an argument
was supplied in quotes or not; the strings in sys.argv already have
had the quotes removed.


Square Brackets
===============

It has been noted above that the characters ``*`` and ``?`` will never
appear in filenames on Windows.  However, the characters ``[`` and
``]``, which glob.glob uses for wildcards, **can** be used in
filenames, and may not be especially uncommon.

There are three possible ways of handling this:

1. Use a version of glob.glob without the wildcard functionality that
   ``[`` and ``]`` provide.  This type of wildcard has never been
   standard for wildcard arguments to MS-DOS or Windows command-line
   programs.

2. Specify some kind of escaping mechanism; for example,
   ``\[foo\].txt`` would refer to a file that has ``[`` and ``]`` in
   its filename.  This may not be intuitive behavior for Windows
   users.

3. Keep glob.glob's standard functionality.  Programs using this
   feature will not be able to operate on files that have square
   brackets in their names.

Of these, the first should adhere best to the principle of least
surprise.  Windows users do not expect square brackets to form
wildcard expressions.  If they want such functionality, they will
probably already be using a shell such as bash that handles it for
them.


Copyright
=========

This document has been placed in the public domain.



..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:


More information about the Python-ideas mailing list