[issue1521950] shlex.split() does not tokenize like the shell
Éric Araujo
report at bugs.python.org
Sat Nov 26 16:25:01 CET 2011
Éric Araujo <merwok at netwok.org> added the comment:
> I was just looking for a reference where I didn't have to sift through tons of documentation.
Sure :) That’s why I suggest using dash for quick tests and rely on the work of other people who did read the POSIX spec. I’ll have to check it too before committing a patch.
> shlex uses a series of character strings to drive it's parsing: whitespace, escape, quotes.
> Add another one: control = '();<>|&'. If it is unset (by default?), then the behavior is as
> before.
So we would need to add a Shlex subclass to the module to provide the new behavior. I think I prefer a new argument, because we can just extend the existing class and functions instead of adding subtly differing duplicates.
> If it is set, then shlex will output any character in control as a separate token.
Unless it is part of a quoted segment, right? (See #7611 for 'foo#bar' vs. 'foo #bar').
> There might be a shell specific script (or maybe it's left to the user)
> that decides that certain tokens can be recombined:
Seems to much complexity. I really prefer if we agree on one command parsing behavior (POSIX, i.e. dash) and improve shlex to support that. People wanting zsh rules can write their own subclass.
> '&&', '||', '|&', '>>', etc.
Wouldn’t it be more correct to consider them different tokens? I don’t have a format training in CS or programming, so I’m not sure that my definition is correct at all, but in my mind a token is a unit, and thus & and && are two different things.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1521950>
_______________________________________
More information about the Python-bugs-list
mailing list