[issue1521950] shlex.split() does not tokenize like the shell

Fri Nov 25 20:25:35 CET 2011

Dan Christian <robodan at users.sourceforge.net> added the comment:

I've attached a diff to test_shlex.py and a script that I used to
verify what the shells actually do.
Both are relative to Python-3.2.2/Lib/test

I'm completely ignoring the quotes issue for now.  That should
probably be an enhancement.  I don't think it really matters until the
parsing issues are resolved.

ref_shlex is python 2 syntax.  python -3 shows that it should convert cleanly.
./ref_shlex.py
It will run by default against /bin/*sh
If you don't want that, do something like: export SHELLS='/bin/sh,/bin/csh'
It runs as a unittest.  So you will only see dots if all shells do
what it expects.  Some shells are flaky (e.g. zsh, tcsh), so you may
need to run it multiple times.

Getting this into the mainline will be interesting.  I would think it
would take some community discussion.  I may be able to convince
people that the current behaviour is wrong, but I can't tell you what
will break if it is "fixed".  And should the fix be the default?  As
you mentioned, it depends on what people expect it to do and how it is
currently being used.  I see the first step as presenting a clear case
of how it should work.

-Dan

On Fri, Nov 25, 2011 at 10:01 AM, Éric Araujo <report at bugs.python.org> wrote:
>
> Éric Araujo <merwok at netwok.org> added the comment:
>
>> Of course, that's how it's used.  That's all it can do right now.
> :) What I meant is that it is *meant* to be used in this way.
>
>> I was was splitting and combining commands (using ;, &&, and ||) and then running the resulting
>> (mega) one liners over ssh.  It still gets run by a shell, but I was specifying the control flow.
> Thank you for the reply.  It is indeed a valuable use case to pass a command line as one string to ssh, and the split/quote combo should round-trip and be useful for this usage.
>
>> I'll see if I can come up with a reference case and maybe a unittest this weekend
> Great!  A new argument (with a default value which gets us the previous behavior) will probably be needed, to preserve backward compatibility.
>
> ----------
> nosy: +niemeyer
> versions: +Python 3.3 -Python 3.2
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <http://bugs.python.org/issue1521950>
> _______________________________________
>

----------
keywords: +patch
Added file: http://bugs.python.org/file23778/ref_shlex.py
Added file: http://bugs.python.org/file23779/test_shlex.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1521950>
_______________________________________
-------------- next part --------------
#!/usr/bin/env python

"""Test how various shells parse syntax.
This is only expected to work on Unix based systems.
We use the unittest infrastructure, but this isn't a normal test.

Usage:
  ref_shelex.py [options] shells...
"""
# Written by: Dan Christian for issue1521950

import glob
import re
import os, sys
import optparse
import subprocess
import unittest


TempDir = '/tmp'                 # where we will write temp files
Shells = ['/bin/sh', '/bin/bash'] # list of shells to test against

class ShellTest(unittest.TestCase):
    bgRe = re.compile(r'\[\d+\]\s+(\d+|\+ Done)$') # backgrounded command output

    def Run(self,
            shell,           # shell to use
            command,         # command to run
            filepath=None):  # any files that are expected
        """Carefully run a shell command.
        Capture stdout, stderr, and exit status.
        Returns: (ret, out, err)
           ret is the return status
           out is the list of lines to stdout
           err is the list of lines to stderr
        """
        start_cwd = os.getcwd()
        call = [shell, '-c', command]
        #print "Running: %s -c '%s'" % (shell, command)
        outpath = 'stdout.txt'
        errpath = 'stderr.txt'
        ret = -1
        out = None
        err = None
        fileout = None
        try:
            os.chdir(TempDir)
            outfp = open(outpath, 'w')
            errfp = open(errpath, 'w')
            if filepath and os.path.isfile(filepath):
                os.remove(filepath)
            ret = subprocess.call(call, stdout=outfp, stderr = errfp)
            #print "Returned: %d" % ret
            outfp = open(outpath, 'r')
            out = outfp.readlines()
            os.remove(outpath)
            errfp = open(errpath, 'r')
            err = errfp.readlines()
            os.remove(errpath)
            if filepath:
                ffp = open(filepath)
                fileout = ffp.readlines()
                os.remove(filepath)
        except OSError as msg:
            print "Exception!", msg
            os.chdir(start_cwd)
            # leave files behind for debugging
            self.assertTrue(0, "Hit an exception running: " % (
                    ' '.join(call)))
        return (ret, out, err, fileout)

    def testTrue(self):
        """ Trivial case to test execution. """
        for shell in Shells:
            cmd = '/bin/true'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                [], out,
                "Expected %s -c '%s' send nothing to stdout, not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testEcho(self):
        """ Simple case to test stdout. """
        for shell in Shells:
            cmd = 'echo "hello world"'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                1, len(out),
                "Expected %s -c '%s' to output 1 line of stdout, not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testRedirectS(self):
        """ output redirect with space """
        for shell in Shells:
            fpath = "out.txt"
            cmd = 'echo "hi" > %s' % fpath
            (ret, out, err, fout) = self.Run(shell, cmd, fpath)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                [], out,
                "Expected %s -c '%s' send nothing to stdout, not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))
            self.assertEquals(1, len(fout))

    def testRedirectNS(self):
        """ output redirect without space """
        for shell in Shells:
            fpath = "out.txt"
            cmd = 'echo "hi"> %s' % fpath
            (ret, out, err, fout) = self.Run(shell, cmd, fpath)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                [], out,
                "Expected %s -c '%s' send nothing to stdout, not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))
            self.assertEquals(1, len(fout))

    def testTwoEchoS(self):
        """ Two seperate output lines (with space) """
        for shell in Shells:
            cmd = 'echo hi ; echo bye'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(['hi\n', 'bye\n'], out)
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testTwoEchoNS(self):
        """ Two seperate output lines (with space) """
        for shell in Shells:
            cmd = 'echo hi;echo bye'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(['hi\n', 'bye\n'], out)
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testBgEcho(self):
        """ Two seperate output lines but unordered """
        # This is flaky.  The output can vary on zsh and tcsh.  Just re-run.
        for shell in Shells:
            cmd = 'echo hi&echo bye; wait'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            # You may get extra lines on csh (hi, bye, bg notice, done notice)
            self.assertTrue(
                len(out) in (2, 3, 4),
                "Expected %s -c '%s' to output 2-4 lines, not %d\n%s" % (
                    shell, cmd, len(out), out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))


def main(args):
    global TempDir, Shells

    val = os.getenv('TEMPDIR')
    if val:
        TempDir = val
    val = os.getenv('SHELLS')
    if val in ('AUTO', 'auto'):
        Shells = glob.glob('/bin/*sh')
        if not Shells:
            print "No shells found as /bin/*sh"
            sys.exit(2)
    elif val is not None:
        Shells = val.split(',')

    print "Testing shells: %s" % ', '.join(Shells)
    unittest.main()  
    

if __name__ == "__main__":
    main(sys.argv[1:])
-------------- next part --------------

--- test_shlex-orig.py	2011-09-03 10:16:44.000000000 -0600
+++ test_shlex.py	2011-11-25 12:01:07.000000000 -0700
@@ -173,6 +173,41 @@
                              "%s: %s != %s" %
                              (self.data[i][0], l, self.data[i][1:]))
 
+    def testSyntaxSplitAmpersand(self):
+        """Test handling of syntax splitting of &"""
+        # these should all parse to the same output
+        src = ['echo hi && echo bye',
+               'echo hi&&echo bye',
+               'echo "hi"&&echo "bye"']
+        ref = ['echo', 'hi', '&&', 'echo', 'bye']
+        # Maybe this should be: ['echo', 'hi', '&', '&', 'echo', 'bye']
+        for ss in src:
+            result = shlex.split(ss)
+            self.assertEqual(ref, result, "While splitting '%s'" % ss)
+
+    def testSyntaxSplitSemicolon(self):
+        """Test handling of syntax splitting of ;"""
+        # these should all parse to the same output
+        src = ['echo hi ; echo bye',
+               'echo hi; echo bye',
+               'echo hi;echo bye']
+        ref = ['echo', 'hi', ';', 'echo', 'bye']
+        for ss in src:
+            result = shlex.split(ss)
+            self.assertEqual(ref, result, "While splitting '%s'" % ss)
+
+    def testSyntaxSplitRedirect(self):
+        """Test handling of syntax splitting of >"""
+        # of course, the same applies to <, |
+        # these should all parse to the same output
+        src = ['echo hi > out',
+               'echo hi> out',
+               'echo hi>out']
+        ref = ['echo', 'hi', '>', 'out']
+        for ss in src:
+            result = shlex.split(ss)
+            self.assertEqual(ref, result, "While splitting '%s'" % ss)
+
 # Allow this test to be used with old shlex.py
 if not getattr(shlex, "split", None):
     for methname in dir(ShlexTest):