python 2.7.12 on Linux behaving differently than on Windows

Steve D'Aprano steve+python at pearwood.info
Mon Dec 5 11:49:04 EST 2016


On Mon, 5 Dec 2016 10:42 pm, BartC wrote:

> I don't know what a shell is. To me, it's some sort of user interface to
> the OS. 

https://en.wikipedia.org/wiki/Unix_shell

You've never used cmd.com or command.exe? "The DOS prompt"? That's
(effectively) a shell.

Pedants may wish to explain exactly why the DOS prompt isn't a shell but to
a first approximation I think its close enough.

And yes, that's exactly what it is: its a text-based user interface to the
OS. And like any user-interface, designers can choose different user
interfaces, which will vary in power and convenience. And in the Unix/Linux
world, the command shell is not just a text interface, its a powerful
command interpreter and programming language.



> So if someone types: 
> 
>    > X A B C
> 
> You would expect X to be launched, and be given arguments A, B and C.

Would I? I don't think so.

Even the DOS prompt supports some level of globbing. Its been a while since
I've used the DOS prompt in anger, but I seem to recall being able to do
things like:

dir a*

to get a listing of all the files starting with "a". So *something* is
treating the * as a special character. In Linux and Unix, that's invariably
the shell, before the dir command even sees what you typed.

In DOS, it might be the dir command itself. The disadvantage of the DOS way
of doing this is that *every single command and application* has to
re-implement its own globbing, very possibly inconsistently. That's a lot
of duplicated work and re-inventing the wheel, and the user will never know
what 

    some_program a*

will do. Will it operate on all the files in the current directory that
start with "a"? Or will it try to operate on a single file called
literally "a*"? Which of course won't exist because * is a reserved
character on DOS/Windows file systems. You can't know ahead of time unless
you study the manual to see what metacharacters this specific command
supports.

The Unix way is far more consistent: applications typically don't have to
care about globbing, because the shell handles glob expansion, environment
variables, etc.

[Aside: if you take the big picture, the Unix/Linux way is probably LESS
consistent, because you can have any number of shells (sh, ash, bash, csh,
tcsh, dash, hush, zsh, and I expect many more). But in practice, there's
one lowest-common-denominator standard (sh) and one major de facto standard
(bash), and most of the shells are supersets of the original sh, so simple
things like wildcards behave in pretty similar ways.]

The downside of this is that if you don't want metacharacters expanded, you
have to tell the shell to ignore it. The easiest way is to escape them with
a backslash, or quote the string. But of course this being Unix, its
completely configurable (using an obscure syntax that's different for every
shell):

http://stackoverflow.com/questions/11456403/stop-shell-wildcard-character-expansion



> You wouldn't expect any of them to be expanded to some unknown number of
> arguments.

Actually yes I would. If they could be interpreted as file names with
globbing or environment variables, that's exactly what I would expect.

Even at the DOS prompt.

And I'd be completely gobsmacked if (say) the dir command understood the ?
metacharacter but the cp command didn't.


> In the same way that in code, you don't expect X(A,B,C) to be expanded
> to X(A,B0,B1,B2,B3,B4,B5,....., C) if B happens to specify a slice.

In Python? No, I wouldn't expect that. Python's not a shell, and the design
is different. In Python, you have to use the metacharacter * to expand a
single argument into multiple arguments.



>> Did you think it was a regular character like 'a' and 'z'?
> 
> If one of the parameters was a regular expression, would you expect it
> to be expanded to the entire possible set of inputs that match the
> expression?

No, because Unix shells use globs, not regexes. Just like the DOS prompt.
Globs are simpler and require less typing, something system administrators
appreciate because (unlike programmers) interactive commands are written
far more than they are read, so brevity is appreciated.

(That doesn't excuse the umount command though. Really, you couldn't include
the "n"?)

So yes, I would expect that if I said

dir a*

I would get a listing of all the files starting with "a", not just the
single file called literally "a*".



>> I think it boils down to what the user expects. Linux and Unix users tend
>> to be technically-minded folks who use the command line a lot and demand
>> powerful tools, and they expect that wildcards like * should be expanded.
> 
> Which is dumb. How does the shell know exactly what I want to do with
> *.* or f*.c or whatever? Perhaps I don't want the expanded list right 
> now (all the 3.4 million elements); 

Sure, no problem. Just escape the metacharacters so they aren't expanded,
just as in Python if you want the literal string backslash n you can
write "\\n" to escape the backslash metacharacter.


> perhaps I want to apply the same pattern to several directories; 

  ls {foo,bar,baz/foobar}/*.*


is equivalent to:

  ls foo/*.* bar/*.* baz/foobar/*.*


And if I want a file that is *literally* called star dot star from any of
those directories:

  ls {foo,bar,baz/foobar}/"*.*"



> perhaps I'm passing it on to another 
> program; perhaps I'm going to be writing it as another script; perhaps I
> just want to print out the parameter list; perhaps I want to transform
> or validate the pattern first; maybe I need to verify an earlier
> parameter before applying the pattern or the way it's applied depends on
> earlier arguments...

Fine. Just escape the damn thing and do whatever you like to it.



> The input is a PATTERN; I want to process it, and apply it, myself.

When you double-click on a .doc file, and Windows launches Word and opens
the file for editing, do you rant and yell that you didn't want Word to
open the file, you wanted to process the file name yourself?

Probably not. You probably consider it perfectly normal and unobjectionable
that when you double click a file, Windows:

- locates the file association for that file;
- launches that application (if its not already running);
- instructs the application to open the file;

(and whatever else needs to be done). That's what double clicking is *for*,
and if you don't like it, you shouldn't be using Windows.

Well, in the Unix world, the shells are designed for the benefit of system
administrators who are *mostly* dealing with file names.  Regardless of
what you think of it, they *want* this behaviour. For forty plus years,
Unix system administrators have been extremely productive using this model,
and Microsoft has even brought the Linux bash shell to Windows.

The bottom line is this:

In your application, if you receive an argument *.*, what do you do with it?
You probably expand it yourself. How does the application know when not to
expand the wild cards? You need to support some sort of command-line switch
to turn it off, but that will effect the entire command line. So you need
some sort of escaping mechanism so that you can pass

myprogram *.* *.*

and have the first *.* expanded but not the second. (For example.)

Congratulations, you've just re-invented your own mini-shell.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list