[issue1521950] shlex.split() does not tokenize like the shell

Éric Araujo report at bugs.python.org
Sat Nov 26 16:25:01 CET 2011


Éric Araujo <merwok at netwok.org> added the comment:

> I was just looking for a reference where I didn't have to sift through tons of documentation.
Sure :)  That’s why I suggest using dash for quick tests and rely on the work of other people who did read the POSIX spec.  I’ll have to check it too before committing a patch.

> shlex uses a series of character strings to drive it's parsing:  whitespace, escape, quotes.
> Add another one: control = '();<>|&'.  If it is unset (by default?), then the behavior is as
> before.
So we would need to add a Shlex subclass to the module to provide the new behavior.  I think I prefer a new argument, because we can just extend the existing class and functions instead of adding subtly differing duplicates.

> If it is set, then shlex will output any character in control as a separate token.
Unless it is part of a quoted segment, right?  (See #7611 for 'foo#bar' vs. 'foo #bar').

> There might be a shell specific script (or maybe it's left to the user)
> that decides that certain tokens can be recombined:
Seems to much complexity.  I really prefer if we agree on one command parsing behavior (POSIX, i.e. dash) and improve shlex to support that.  People wanting zsh rules can write their own subclass.

> '&&', '||', '|&', '>>', etc.
Wouldn’t it be more correct to consider them different tokens?  I don’t have a format training in CS or programming, so I’m not sure that my definition is correct at all, but in my mind a token is a unit, and thus & and && are two different things.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1521950>
_______________________________________


More information about the Python-bugs-list mailing list