PEP thought experiment: Unix style exec for function/method calls

Sun Jun 25 07:28:18 EDT 2006

Hi,

[ I'm calling this PEP thought experiment because I'm discussing language
  ideas for python which if implemented would probably be quite powerful
  and useful, but the increased risk of obfuscation when the ideas are
  used outside my expected/desired problem domain probably massively
  outweigh the benefits. (if you're wondering why, it's akin to adding
  a structured goto with context)

  However I think as a thought experiment it's quite useful, since any
  language feature can be implemented in different ways, and I'm wondering
  if anyone's tried this, or if it's come up before (I can't find either
  if they have...). ]

I'm having difficulty finding any previous discussion on this --  I
keep finding people either having problems calling os.exec(lepev), or
with using python's exec statement. Neither of which I mean here.

Just for a moment, let's just take one definition for one of the
os.exec* commands:

    execv(...)
        execv(path, args)

        Execute an executable path with arguments, replacing current
        process.
            path: path of executable file
            args: tuple or list of strings

Also: Note that execv inherits the system environment.

Suppose we could do the same for a python function - suppose we could
call the python function but either /without/ creating a new stack
frame or /replacing/ the current stack frame with the new one.

Anyway, I've been thinking recently that the same capability in python
would be useful. However, almost any possible language feature:
   * Has probably already been discussed to death in the past
   * There's often a nice idiom working around the lack of said feature.

So I'm more on an exploratory forage than asking for a language change
here ;)

Since os.exec* exists and "exec" already  exists in python, I need to
differentiate what I mean by a unix style exec from python. So for
convenience I'll call it "cexe".

Now, suppose I have:
    ----------
    def set_name():
        name = raw_input("Enter your name! > ")
        cexe greet()

    def greet():
        print "hello", name

    cexe set_name()
    print "We don't reach here"
    ----------

This would execute, ask for the user's name, say hello to them and then
exit - not reaching the final print "We don't reach here" statement.

Let's ignore for the moment that this example sucks (and is a good example
of the danger of this as a language feature), what I want to do here is
use this to explain the meaning of "cexe".

There's two cases to consider:
  cexe some_func_noargs()

    This transfers execution to the function that would normally be called
    if I simply called without using "cexe" some_func_noargs() . However,
    unlike a function call, we're /replacing/ the current thread of
    execution with the thread of execution in some_func_noargs(), rather
    than stacking the current location, in order to come back to later.

    ie, in the above this could also be viewed as "call without creating a
    new return point" or "call without bothering to create a new stack
    frame".

    It's this last point why in the above example "name" leaks between the
    two function calls - due to it being used as a cexe call.

Case 2:
  given...
     def some_func_withargs(colour,tone, *listopts, **dictopts)

  consider...
     cexe some_func_withargs(foo,bar, *argv, **argd)

     This would be much the same as the previous case, except in the new
     execution point, the name colour & tone map to the values foo & bar had
     in the original context, whilst listopts and dictopts map the values
     that argv & argd had in the original content)

One consequence here though is that in actual practice the final print
statement of the code above never actually gets executed. (Much like if
that was inside a function, writing something after "return foo", wouldn't
be executed)

The reason I'm curious here about previous discussion is because
conceptually there's obviously other semantics you can apply - such as
the current stack frame is /replaced/ by the new stack frame. This is
perhaps a more accurate mapping to the Unix exec call. 

If that was the case, it would mean that locals would not "leak" between
functions (which is desirable), and our example above could be rewritten
as follows:

    ----------
    def get_and_use_value_from_user(tag, callforward):
        somevalue = raw_input(tag)
        cexe callforward(name)

    def greet(name):
        print "hello", name

    cexe get_and_use_value_from_user("Enter your name! > ", greet)
    print "We don't reach here"
    ----------

OK, so this probably seems pretty pointless to many people, but I'm
curious about improving the tools to deal with state machines. Often
people use switch statements in other languages to deal with them, and
for certain classes of state machines you can use replace them with
generators. But that's not appropriate for everything...

My particular thought that started all this off actually stems from this:

Essentially by doing a cexe we're actually creating a composite function
out of disparate functions (perhaps shared or not shared local context).
ie ...
    ----------
    def count():
        print "Counting to 3!"
        cexe one()

    def one():
        print "one!"
        cexe two()

    def two():
        print "two!"
        cexe three()

    def three():
        print "three!"

    count() # Note I'm not doing cexe count() here
    ----------
... essentially dynamically constructs an execution context similar to a
single function, ie the above collapses to something like:

    ----------
    def count():
        print "Counting to 3!"
        print "one!"
        print "two!"
        print "three!"

    count() # Note I'm not doing cexe count() here
    ----------
It's this recognition that made me wonder this:

This works well for state machines, and generators are a nice model for
dealing with resumable things (and a state machine can be viewed as a
resumable "thing").

Now suppose we take all that one stage further and provide said
composite generator, with some additional context in the way we do
with Kamaelia - cf http://kamaelia.sf.net/MiniAxon/ , we could
potentially do this:

(choosing something relatively substantial to show I'm not just
 being whimsical, and to provide somthing perhaps more "real")

class TCP_StateMachine(Axon.Component.component):
    def CLOSED(self):
       if not self.anyReady(): yield self.pause()
       event = self.recv("inbox")
       if "appl passive open" == event.type: cexe self.LISTEN()
       if "active open" == event.type:
           self.send(SYN(event.payload), "network")
           cexe self.SYN_SENT()

    def LISTEN(self):
       if not self.anyReady(): yield self.pause()
       event = self.recv("inbox")
       if "recv syn" == event.type:
           self.send(   , "network")
           cexe self.SYN_RCVD()
       if "appl send data" == event.type:
           self.send(   , "network")
           cexe self.SYN_SENT()

    def SYN_RCVD(self):
       if not self.anyReady(): yield self.pause()
       event = self.recv("inbox")
       if "recv rst" == event.type:  cexe self.LISTEN()
       if "recv ack" == event.type:  cexe self.ESTABLISHED()
       if "appl close" == event.type:
           self.send(FIN(event.payload), "network")
           cexe self.FIN_WAIT1()

    def SYN_SENT(self):
       if not self.anyReady(): yield self.pause()
       event = self.recv("inbox")
       if "appl close" == event.type: cexe self.CLOSED()
       if "timeout" == event.type: cexe self.CLOSED()
       if "recv syn-ack" == event.type:
           self.send(ACK(event.payload), "network")
           cexe self.ESTABLISHED()

    def ESTABLISHED(self):
       # more complex than others, so skipped, has its own data transfer
       # state etc, so would make more sense to model as a subcomponent.

    def FIN_WAIT_1(self):
       if not self.anyReady(): yield self.pause()
       event = self.recv("inbox")
       if "recv ack" == event.type: cexe self.FIN_WAIT_2()

       if "recv fin" == event.type:
           self.send(ACK(event.payload), "network")
           cexe self.CLOSING()

       if "recv fin, ack" == event.type:
           self.send(ACK(event.payload), "network")
           cexe self.TIME_WAIT()

    def FIN_WAIT_2(self):
       if not self.anyReady(): yield self.pause()
       event = self.recv("inbox")
       if "recv fin" == event.type:
           self.send(ACK(event.payload), "network")
           cexe self.TIME_WAIT()

    def CLOSING(self):
       if not self.anyReady(): yield self.pause()
       event = self.recv("inbox")
       if "recv ack" == event.type: cexe self.TIME_WAIT()

    def TIME_WAIT(self):
       if not self.anyReady(): yield self.pause()
       event = self.recv("inbox")
       if "timeout 2MSL" == event.type: cexe self.CLOSED()

Now obviously that's not particularly pretty, but the clear definition
of states as methods, and clear transitions between states via the cexe
calls, is relatively easy to follow through. ie it's fairly clear it's
implementing the standard TCP state machine.

(Incidentally if you're wondering what relevance this has outside of
just TCP, this sort of thing could be useful in games for modelling
complex behaviours)

What is less clear about this is that I'm working on the assumption that
as well as the language change making "cexe" work, is that this also
allows the above set of methods to be treated as if it's one large
generator that's split over multiple function definitions. This is
conceptually very similar to the idea that cexe would effectively
"join" functions together, as alluded to above.

This has a number of downsides for the main part of the language, so
I wouldn't suggest that these changes actually happen - consider it a
thought experiment if you like. (I think the single function/no wrapping
of yield IS actually a good thing)

However, I feel the above example is quite a compelling example of how
a unix style exec for python method calls could be useful, especially
when combined with generators. (note this is a thought experiment ;)

It also struck me that any sufficiently interesting idea is likely to
have already been implemented, though perhaps not looking quite like the
above, so I thought I'd ask the questions:

  * Has anyone tried this sort of thing?

  * Has anyone tried simply not creating a new stack frame when doing
    a function call in python? (or perhaps replacing the current one with
    a new one)

  * Has anyone else tried modelling the unix system exec function in
    python? If so what did you find?

  * Since I can't find anything in the archives, I'm presuming my
    searching abilities are bust today - can anyone suggest any better
    search terms or threads to look at?

  * Am I mad? :)

BTW, I'm aware that this has similarities to call with continuation,
and that you can use statesaver.c & generators to achieve something
vaguely similar to continuations, but I'm more after this specific
approach, rather than that general approach. (After all, even ruby
notes that their most common use for call/cc is to obfuscate code -
often accidentally - and I'm not particularly interested in that :)

Whereas the unix style exec is well understood by many people, and
when it's appropriate can be extremely useful. My suspicion is that
my ideasabove actually maps to a common idiom, but I'm curious to
find that commonidiom.

I'm fairly certain something like this could be implemented using
greenlets, and also fairly certain that Stackless has been down this
route in the past, but I'm not able to find something like this exec
style call there. (Which is after all more constrained than your usual
call with continuation approach)

So, sorry for the length of this, but if anyone has any thoughts, I'd be
very interested. If they don't, I hope it was interesting :)

Regards,

Michael.