Linux Processes from Python HOW-TO?

Fri Apr 28 17:46:08 EDT 2000

On Fri, Apr 28, 2000 at 07:39:00PM +0000, chris_barker at my-deja.com wrote:

> Has anyone written a HOW-TO or tutorial that explains how to start and
> manipulate other processes with Python. I imagine it's trivial if you
> are familiar with the Linux/Unix process model, but I get very confused
> about when to use:

> system
> popen and it's variants
> exec and it's variants
> fork
> etc.

Well, it's detailed in most books about UNIX, and in most books about
operating systems in general. (At least on the programming level.) I dont
know any HOWTO-level explanation of it, but I'm happy to supply you with
some short descriptions. I'll do it in the reverse order of what you listed,
though, because it's easier that way ;)

First, the basics: under UNIX, and most current operating systems, a running
program consists of one or more processes. A single program, or a single
task, is usually a single process, but it's possible for several processes
to communicate with eachother, to share workload. Each process has its own
memoryspace, invisible to other processes, and it can only access its own
memoryspace.

To create a new process, you use the system call fork(). It duplicates the
current process, copies all of its memory, instruction pointers, function
stack, everything. The only difference between the two processes is the
process-id (pid), and the return code of fork(): In the new process, the
'child', fork returns 0. In the original process, the 'parent', fork()
returns the pid of the newly created child process. (If fork() fails, it
wont create a child process, of course, and the return code will signify an
error.)

Because fork() creates an exact duplicate of the old process, you can't use
it to start a new program. You use one of the 'exec*()' calls for that. This
system call *replaces* the calling process by a new process, running a new
program as indicated by the first argument to the exec() call. If exec()
returns anything, it's always an error, because if it succeeds, the process
that called it will be gone. The new process isn't really a new process,
though, because it will have the pid of the old one. It'll just be running a
different program ;)

The difference between the various exec*() calls is fairly subtle, and a bit
strange from a Python point of view: they are mostly about how to pass the
arguments: the execl*() calls take a variable number of arguments,
whereas the execv*() calls take a pointer to a list of arguments. The
exec*p() calls search the environment variable PATH for the name of the
executable, whereas the other exec*() functions expect the name of the
executable to be a full path to the program. And lastly, the exec*e()
calls also take an argument that list the environment variables that
should be passed to the new program, whereas the other calls pass the same
environment as the current process has.

These two system calls are all that is necessary to create as many and as
varying processes as you wish. None the less, the Standard C Library
provides a few funky and helpful functions that save you a lot of time in
the common cases:

system() is a library call that takes a single string describing the program it
should start, and the arguments it should get. It forks() and exec()s for
you, and then waits for the child to exit (using one of the wait() calls)
and returns the exit code. You dont have particularly much control over how
the string is executed, other than via the standard 'shell' commands: system
always starts /bin/sh with your string as argument.

popen() and it's friends are similar functions, but instead of running the
string you pass and returning the exitcode, it starts the new process with
its standard input or standard output (or both, in the case of popen2)
connected to the calling process (using a UNIX 'pipe', hence the first p in
popen()). This way, the calling process can pass data to the program just as
a user would type it on a computer, or read data that the new program writes
to the terminal.

So, basically, if you simply want to run a command and dont care much about
runtime influence, you usually use system(). If you need to pass it some
data for it to work on, for example 'gzip' or some other program that works
on the input stream, you use popen(). And if you want more control over
things, or if you simply want your python process to do two tasks at once,
or you wish to start a lot of programs in a specific order, or other
complicated things, you use fork() and exec() manually.

> I'm really hoping someopne has written this up!

Well, I just did, didn't I ? ;) There are a lot more details I didn't
mention, by the way: threads (processes that share their memory space);
redirecting stdin, stdout and stderr somewhere specific, or to another
process; creating pipes and sockets yourself; how to avoid deadlocks in
popen2; shared memory; controling ttys, how to see the difference between a
simple pipe and a real user, and how to fool programs that try to see that
difference; UNIX session ids; setuid() and other ways to change the current
process's permissions; And probably a lot more ;P Nevertheless, I hope this
helps a bit :-)

If you need more info than this, either read a UNIX manualpage for one of
those functions, or if you prefer readable material, buy a good book on UNIX
programming. They usually start with a good explanation of processes, the
memory layout, pipes, sockets and networking. Some even cover shared memory.

Drained-ly y'rs,
-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!