python 2.7.12 on Linux behaving differently than on Windows

Steve D'Aprano steve+python at pearwood.info
Wed Dec 7 18:35:33 EST 2016


On Thu, 8 Dec 2016 02:19 am, BartC wrote:

> On 07/12/2016 14:34, Steve D'Aprano wrote:
[...]
>> I don't know why you are so hung up over the number of characters here,
>> or this bogeyman of "one million files" in a directory.
> 
> Because /you/ brought it up as a reason why 'globbing' would help when
> there are command limits?

I did?

I don't remember saying any such thing. Citation required.



[...]
>> But... how does your program distinguish between the file spec *.* which
>> should be expanded, and the literal file *.* which shouldn't be expanded?
> 
> Do you mean a *.* parameter that refers to all the files in the
> directory (whether the intention is to build a list of those files or
> not) and a *.* parameter that means something entirely different?

We're talking Linux here, where "*.*" is a legal (if stupid) file name.
Existing Linux programs (mostly?) don't do their own globbing, so if I want
to list such a file, I just need to protect the metacharacters from the
shell:

[steve at ando ~]$ ls -l "*.*"
-rw-rw-r-- 1 steve steve 7 Dec  8 09:30 *.*

and there it is. Now, you *insist* that the one and only correct way to do
this is to disable shell globbing, and have each and every program do its
own globbing. Okay, I can do the first part, and because ls doesn't in
reality do its own globbing, it works fine:

[steve at ando ~]$ set -o noglob
[steve at ando ~]$ ls -l *.*
-rw-rw-r-- 1 steve steve 7 Dec  8 09:30 *.*


But now imagine that you are the author of ls, and that following your
strategy of having programs do their own globbing, ls expands that as a
glob. The shell is no longer involved here, but globbing still needs to be
defeated, only this time I'm at the mercy of each individual program, not
just a single shell.

So my question is, in your programs that you write, where you implement your
own globbing, do you support escaping metacharacters? If you were the
author of Linux ls, could I write 

    ls -l \*.\*

or possibly 

    ls -l "*.*"

or even:

    ls -l --noglob *.*

to get a listing of the file named "*.*"?

In the Unix world, I'm at the mercy of *one* program, the shell. In your
ideal world, I'm at the mercy of EVERY program that implements globbing:
each and every one has to independently offer a way to disable their own
internal globbing. How is that an improvement?

Hence my rhetorical question. When you write a program that supports
globbing, do you always remember to provide an escape mechanism?


The string quoting, escaping and expansion rules for bash are pretty darn
complicated. You have parameter expansion, aliases, two distinct types of
quoting, backslashes, shell variables, environment variables, and more,
which are applied in a certain order. Once you get past the simple cases,
bash programming is hairy and frankly scarily complex.

But at least the rules are consistent since they're applied by the shell. I
don't have to learn a separate set of rules for each and every program I
use:

"ls supports globbing, but not brace expansion..."

"rm supports globbing and brace expansion, but not parameter expansion..."

"mkdir supports tilde expansion, but incompletely..."

"touch supports only a subset of arithmetic expansion, but with bugs..."


> OK, but you go first: how does a Linux program tell the difference
> between:
> 
>    A B C
> 
> which signifies three files, and:
> 
>    A B C
> 
> which means two files, and an option C? And without using escapes!

First of all, as you well know, the convention is for options to start with
either a single or double hyphen, or sometimes a plus, and usually to come
first. So I would write:

    program -C A B


But let's suppose that this particular program doesn't follow that standard
convention, and instead has the argument signature:

    argument1 argument2 option

Fine. Where's the difficulty? The program would read the command line,
collect the first argument as `argument1`, the second argument as
`argument2`, and the third argument as `option`.

If I was particularly awful at designing user interfaces, I could even have
the signature where the third parameter could be either:

    argument1 argument2 option_or_argument3


# Python syntax
if option_or_argument3 in ("off", "on"):
    option = option_or_argument3
    argument3 = ""
else:
    option = "off"
    argument3 = option_or_argument3


I don't see what point you think you are making. How the program interprets
its command arguments is independent of the shell.


>> You need an escape mechanism too. You've done that, of course. Right?
> 
> My program wouldn't need anything so crude. The input syntax would be
> more carefully designed so as to not have the ambiguity.

Oh of course it would. How silly of me to even ask.

*rolls eyes*

I'll take that as a "No".


> And actually I generally avoid command line interfaces (ones used by
> shells) because they are such a pain. (Why, for example, aren't the
> contents of the command line - even after preprocessing by the shell -
> just read using exactly the same mechanism as subsequent lines of input?)

Because then you couldn't tell the difference between command line arguments
and standard input.

Because not all programs even need to care about reading standard input.



>>>    list fooba?.jpg {here,there}/*.jpg  another/place/*.{png,jpg} \
>>>      [a-z]/IMG* > files
>>>
>>>    command @files
>>
>> *Requiring* this would be completely unacceptable. Forcing the user to
>> write two lines (one to create a variable, then to use it) every time
>> they needed a glob expansion would go down among sys admins like a lead
>> balloon.
> 
> Suppose there is a pattern that can't be expressed as a one-liner?


You mean like this?

[steve at ando ~]$ export manyliner="hello
> world
> "
[steve at ando ~]$ echo $manyliner
hello world


But if you mean embedding newlines in the variable, well, bash is not
well-suited for that. It's possible, but a PITA. Look it up: I'm sure it
will have been answered on StackOverflow.

But what of it? What's your point? We all acknowledge that bash is not a
great choice for general purpose application programming, its a scripting
language optimized for certain tasks at the expense of others. In the areas
it has been designed for, your question doesn't come up often enough that
people care about it.


> And the shell is supposed to be a full-fledged programming language,
> people keep saying. What can be more natural than to make use of
> variables? As you go on to demonstrate!

You wouldn't write a low-level device driver in Perl, and you wouldn't write
a file utility in APL. All languages have their strengths and weaknesses.
What's your point?




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list