library to do easy shell scripting in Python

Matt Nordhoff mnordhoff at mattnordhoff.com
Thu Apr 24 02:56:09 EDT 2008


Wow, this message turned out to be *LONG*. And it also took a long time
to write. But I had fun with it, so ok. :-)

Michael Torrie wrote:
> Recently a post that mentioned a recipe that extended subprocess to
> allow killable processes caused me to do some thinking.  Some of my
> larger bash scripts are starting to become a bit unwieldy (hundreds of
> lines of code).  Yet for many things bash just works out so well because
> it is so close to the file system and processes.  As part of another
> project, I now have need of a really good library to make it almost as
> easy to do things in Python as it is in Bash.  With a simple wrapper
> around subprocess, I'm pretty much able to do most things.  Most of my
> complicated bash hackery involves using awk, sed, grep, and cut to
> process text, which python does quite nicely, thank you very much.  But
> there's a few things to add.
> 
> To wit, I'm wanting to write a library that can deal with the following
> things:
> 
>   - spawn a process, feed it std in, get stdout, stderr, and err code.
>     This is largely already accomplished by subprocess

It is accomplished by subprocess.Popen:

The 'communicate' method handles stdin, stdout and stderr, waiting for
the process to terminate.

The 'wait' method just waits for the process to terminate and returns
the return code.

The 'returncode' attribute contains the return code (or None if the
process hasn't terminated yet).

You could write a convenience wrapper function if you want to do this in
a more terse way.

>   - spawn off processes as background daemons

Couldn't you do this with subprocess by doing subprocess.Popen([prog])
and, well, nothing else? (You may have/want to set stdin/stdout/stderr
too. I dunno.)

>   - spawn multiple processes and pipe output to input.
>     - can do fancier things like bash does, like combine stderr/stdout,
>       switch stderr/stdout, redirects to and from files

That's possible with subprocess.

See this paragraph of <http://docs.python.org/lib/node528.html>:

> stdin, stdout and stderr specify the executed programs' standard input, standard output and standard error file handles, respectively. Valid values are PIPE, an existing file descriptor (a positive integer), an existing file object, and None. PIPE indicates that a new pipe to the child should be created. With None, no redirection will occur; the child's file handles will be inherited from the parent. Additionally, stderr can be STDOUT, which indicates that the stderr data from the applications should be captured into the same file handle as for stdout.

And also <http://docs.python.org/lib/node535.html>. Not the least
verbose, but pretty simple, and I bet it can do anything bash can.

>     - transparently allow a python function or object to be a part of
>       the pipeline at any stage.

Hmmm. I can't think very well at the moment, but you could create
file-like objects that do...I dunno, callbacks or something.

Simple and incomplete mockup:

class Pipe(object):
    def __init__(self, from_fh, to_fh, from_callback=None,
to_callback=None):
        self.from_fh = from_fh
        self.to_fh = to_fh
        self.from_callback = from_callback
        self.to_callback = to_callback

    def read(self, *args, **kwargs):
        data = self.from_fh.read(*args, **kwargs)
        if self.from_callback is not None:
            self.from_callback(data)
        return data

    def write(self, data):
        # XXX Call the callback before or after the data is actually
written?
        if self.to_callback is not None:
            self.to_callback(data)
        return self.to_fh.write(data)

That just passes input and output through itself, also passing it to
callback functions. You'd have to add all the other methods too, like
readline and __iter__... Maybe inheriting from 'file' would get most of
them. I dunno how it works internally.

> Questions include, how would one design the interface for things, like
> assembling pipes?  Several ideas include:
> 
> pipe([prog1,args],[prog2,args],...)
> 
> or
> 
> run([prog1,args]).pipe([prog2,args]).pipe(...)
> 
> The former doesn't deal very well with re-plumbing of the pipes, nor is
> there an easy way to redirect to and from a file. The second syntax is
> more flexible but a bit cumbersome.  Also it doesn't allow redirection
> or flexible plumbing either.
> 
> Any ideas on how I could design this?

Ok, the below is an edited-down, more formal-sounding brain dump.

Idea 1:

>>> run([prog, args], from_fh, to_fh, from_callback, to_callback).run(...)

It would basically just automate the construction of the intermediary
pipe objects suggested above.

It could also be done with tuples, like:

>>> run([prog, args], (from_fh, to_fh), (from_callback,
to_callback)).run(...)

Idea 2:

This one would parse a list similar to a bash command line.

run('prog', '>>out', '|', 'other_prog', 'arg', '>', 'foo.txt')

Which would be like a bash:

`prog 2>&1 | other_prog arg >foo.txt`

("2>&1" is how you combine stdout and stderr, right?)

"<", ">", ">>" and "|" would be keywords that behave similarly to bash.

You would use e.g. ['>', 'foo.txt'] to pass the argument to the
keywords. Along with string filenames, it would accept file-like objects
and file descriptors like subprocess.Popen does.

">>out", would be equivalent to bash's "2>&1". Similar things like
">err" would work too. They would be entirely separate keywords, not
['>>', 'out'] or something, so you could use "out" as the filename if
you wanted to.

If the last command in the pipeline didn't have its stdout and stderr
redirected somewhere, their file objects would be returned. If you
wanted them going to your regular stdout and stderr, I guess you would
have to end the pipeline with [">out", ">>err"].

I just realized I made a mistake: In my keywords, I used ">x" for stdout
and ">>x" for stderr. Bash uses ">x" and ">>x", and "2>x" and "2>>x",
respectively. That could be changed, or we could just leave it all
confusing-like. ;-)

Idea 3:

You could be dirty and just use os.system(). ;-)
-- 



More information about the Python-list mailing list