porting shell scripts: system(list), system_pipe(lists)

eichin at metacarta.com eichin at metacarta.com
Thu Oct 2 00:20:11 EDT 2003


One of my recent projects has involved taking an accretion of sh and
perl scripts and "doing them right" - making them modular, improving
the error reporting, making it easier to add even more features to
them.  "Of course," I'm redoing them in python - much of the cut&paste
reuse has become common functions, which then get made more robust and
have a common style and are callable from other (python) tools
directly, instead of having to exec scripts to get at them.  The usual
"glorious refactoring."

Most of it has been great - os.listdir+i.endswith() instead of
globbing, exception handling instead of "exit 1", that sort of thing.
I've run into one weakness, though: executing programs.

Python has, of course, os.fork and os.exec* corresponding to the raw
unix functions.  It also has the higher level os.system, popen,
expect, and commands.get* functions.  The former need a bunch of
stylized operations performed; the latter *all* involve passing in
strings which then leads one to quoting issues, which can be serious
risks in some applications.

Perl had one very helpful interface for this kind of thing: system and
exec will both take array arguments:
  $ perl -e 'system("echo", "*")'
  *
  $ perl -e 'exec("echo", "*")'
  *
versus
  $ perl -e 'exec("echo *")'
  #.newsrc-dribble# CVS stuff ... 
This has always struck me as "correct" - not the overloading,
necessarily, but the use of a list.  

So, implementing system this way is easy enough:

def system(cmd):
    pid = os.fork()
    if pid > 0:
        p, st = os.waitpid(pid, os.P_WAIT)
        if st == 0:
            return
        raise ExecFailed(str(cmd), st)
    elif pid == 0:
        try:
            os.execvp(cmd[0], cmd)
        except OSError, e:
            traceback.print_exc()
            os._exit(113)

[The try/except is an interesting issue: if cmd[0] isn't found,
os.execvp throws -- but it is already in the child, and this walks up
the stack to any surrounding try/except, which then continues,
possibly disastrously, whatever that code had been doing *in a
duplicate process*.  The _exit explicitly short cuts this.]

So, this makes a big difference when porting simple bits of shell (and
usually, just in passing, fixing quoting bugs - if you had code that
used to do "ci -l $foo" and it is now "system(['ci', '-l', foo])"
you now properly handle spaces and punctuation in the value of foo,
"for free".)  However, the other thing you tend to find in
"advanced"[1] shell scripts is lengthy pipelines.  (Sure, you find
while loops and case statements and such - but python's control
structures handle those fine.)

Implementing pipelines takes rather a bit more work, and one might
(not unreasonably) throw up one's hands and just use os.system and
some re.sub's to do the quoting.  However, I had enough cases where
the goal really was to run a complex shell pipeline (I also had cases
where the pipeline converted nicely to some inline python code,
especially with the help of the gzip module) that I sat down and
cooked up a pipeline class.

The interface I ended up with is pretty simple:
        g_pipe = pipeline()
        g_pipe.stdin(open("blort.gz", "r"))
        g_pipe.append(["gunzip"])
        g_pipe.append(["sort", "-u"])
        g_pipe.append(["wc", "-l"])
        g_pipe.stdout(open("blort.count", "w"))
        print g_pipe.run()

is equivalent to the sh:
    gunzip < blort.gz | sort -u | wc -l > blort.count

pipeline also has obvious stderr and chdir methods; pipeline.run
actually returns an array with the return status of *each* pipeline
element (which leads to "if filter(None, st): deal_with_error" being a
useful idiom for noticing failures that a shell script would typically
miss.)

This has lead me to a few questions:

 1. Am I being dense? Are there already common modules (included or
    otherwise) that do this, or solve the problem some other way?
 2. Is there a more pythonic way of expressing the construction?
    Would exposing the internal array of commands make more sense,
    possibly by "passing through" various array operations on the
    class to the internal array (as the use of "append" hints at)?  Or
    maybe "exec" objects that a "pipe" combiner operates on?
 3. Should an interface like this be in a "battery" somewhere? shutil
    didn't seem to quite match...
 4. Any reason to even try porting this interface to non-unix systems?
    Is there a close enough match to os.pipe/os.fork/os.exec/os.wait,
    or some other construct that works on microsoft platforms?

			_Mark_ <eichin at metacarta.com>

[1] in the Invader Zim sense :)




More information about the Python-list mailing list