[Edu-sig] Reloading code (was Re: OLPC: first thoughts)

Sun Feb 25 05:41:27 CET 2007

On 2/24/07, Paul D. Fernhout <pdfernhout at kurtz-fernhout.com> wrote:
> kirby urner wrote:
> > On 2/24/07, Paul D. Fernhout <pdfernhout at kurtz-fernhout.com> wrote:
> >
> >>There may be one major semantical issue, in terms of the meaning of side
> >>effects when loading a module  (e.g. defining singletons, opening files,
> >>etc.) which is hard to deal with generically with Python. You can deal
> >>with [it] specifically in how you write your own code, but that is not a
> >>general solution.
> >
> >
> > Not sure I follow yet.  A module loads top to bottom, with lower defs
> > premised on
> > those previously mentioned.  Is that what you mean?  Once everything is loaded,
> > it's more like a __dict__, i.e. the namespace of the module of
> > accessible, either
> > via dot notation, or directly if the names are top level.
>
> To step back for a minute, the fundamental problem here is that for
> whatever reason a programmer wants to modify just one method of an already
> loaded Python class (which came from a textual module which was loaded
> already), save the change somewhere so it can be reloaded later
> (overwriting part of the textual module?), and also have the program start
> using the new behavior for existing instances without any other side
> effects arising from recompiling this one change. In practice, this is
> trivial to do in almost any Smalltalk system; it is hard if not impossible
> to do in any widely used Python IDE or program (even when a Python shell
> is embedded).
>
> Unfortunately, the paradigm used by every Python IDE I've tried is to
> reload an entire textual module (or more typically, entire program) at a
> time for even the slightest change to one function. Embedded Python shells
> generally allow you to redefine a function if you have a copy of the code,
> but they offer no way to save the code. Most Smalltalks uses a different
> paradigm, where code is presented to the user one function at a time in a
> browser and is compiled one function at a time. Yes, there are cases where
> people "filein" Smalltalk code defining a complex program, but such
> fileins are generally considered an *interchange* format, not a preferred
> program representation for editing unlike as is usually the case with Python.
>
> Consider the meaning of an arbitrary piece of Python code near the bottom
> of a textual module. Essentially, you have no idea what it means if the
> original author has used some Python bells and whistles. For example, he
> or she could have defined a metaclass where every nested "def" under a
> class was converted to, say, an uppercase string and stored under a key
> that was the numerical hash of the function name (with no functions
> actually defined for that class perhaps). The specific metaclass behavior
> may even hinge on the current state of a global which has been modified
> several times during the course of loading the module. So essentially, you
> have no way of knowing for sure what any apparent Python code really means
> by isolated inspection. And because any module can run any arbitrary
> Python code, without actually running the Python program (or doing the
> equivalent analysis), you can never be sure what side effects loading a
> module has. Now, Smalltalk has metaclasses too, but in practice, because
> of the way code is presented to the user and edited and recompiled one
> method/function at a time, the context makes fairly clear what is going to
> happen when that snippet of code you just changed is compiled. The big
> difference is really the effective unit of compilation -- the complex
> module in Python or the simple method/function in Smalltalk.
>
> Now, this is rarely a problem the *first* time a module is loaded, but it
> generally becomes a problem when a module is *reloaded*. If you only
> treated as module as an *interchange* format, and then modified the live
> classes using a tool which only works on regular classes (like PataPata
> does), there is no  need to reload the module, so this potential problem
> related to parsing a modules meaning via an IDE tool remains only
> potential, and also avoided is the possibility reloading a module might
> have side effects. (In practice, anything still depends on mapping from a
> function back to its source text, and this may go wrong for various
> reasons... :-)
>
> Naturally, this kind of major redefinition is rarely done, and it would
> create lots of confusion, but it is possible, so IDE tools that do not
> support it are incomplete. This is a perrenial problem with, say, C, where
> you can make all sorts of macros and so never know just exactly what
> arbitrary C code out of context does (see the obfuscated code
> contests...). And it means that you can't get a simple one-to-one mapping
> of a section of a file that looks like it defines a function and an actual
> function reliably without analyzing the entire program. Yes, 99.99% of the
> time Python code does the obvious thing, but it is not 100% certain. The
> same is true for Forth -- in theory any isolated snippet of Forth can mean
> anything, since it is trivially easy to modify how the compiler interprets
> text -- something that make Forth very powerful but at the same time
> potentially very confusing for a code maintainer. I don't have the link
> offhand, but a while back I came across a blog post suggesting you tend to
> either have a powerful language or powerful tools -- but not at the same
> time (except perhaps for Smalltalk :-). That is because if the language is
> very flexible, it becomes almost impossible to write IDE tools that can
> keep up with it in all its generality.
>
> Now, since almost all Python code is written in a straightforward manner,
> one can still make such tools and find them useful. But likely there will
> aways be gotchas in such systems as long as they tie their operation
> closely to the notion of compiling one module at a time, compared to
> Smalltalk which ties itself to compiling one method/function at a time.
>
> One of the things PataPata tried to do, and succeeded to some extent, was
> breaking the link between reloading a textual module and modifying a
> running Python program, yet it was still able to use a textual Python
> module as both an interchange format and also an image format (something
> even no Smalltalk has done to my knowledge, as all Smalltalk images I know
> of are binary, not human editable text).
>
> One idea I have wanted to try for Python but never got around to it is to
> create a Smalltalk-like browser and build and modify classes on the fly by
> changing their objects and compiling only individual functions as they are
> changed; I could store the textual representation of functions in a
> repository with version control. Then, I could also still use Python
> modules as an interchange format, sort of like PataPata did but without
> prototypes. You would lose some of the generality of coding in Python
> (setting globals in a module and such) but you would essentially have a
> somewhat Smalltalk like environment to mess with (ignoring restarting from
> exceptions, which is very important in Smalltalk development, where much
> code ends up being written in the debugger as often as not; I'm not sure
> whether that part could be simulated with plain Python or whether it would
> require a VM change).
>
> --Paul Fernhout

# xreload.py.

"""Alternative to reload().

This works by executing the module in a scratch namespace, and then
patching classes, methods and functions.  This avoids the need to
patch instances.  New objects are copied into the target namespace.
"""

import imp
import sys
import types

def xreload(mod):
    """Reload a module in place, updating classes, methods and functions.

    Args:
      mod: a module object

    Returns:
      The (updated) input object itself.
    """
    # Get the module name, e.g. 'foo.bar.whatever'
    modname = mod.__name__
    # Get the module namespace (dict) early; this is part of the type check
    modns = mod.__dict__
    # Parse it into package name and module name, e.g. 'foo.bar' and 'whatever'
    i = modname.rfind(".")
    if i >= 0:
        pkgname, modname = modname[:i], modname[i+1:]
    else:
        pkgname = None
    # Compute the search path
    if pkgname:
        # We're not reloading the package, only the module in it
        pkg = sys.modules[pkgname]
        path = pkg.__path__  # Search inside the package
    else:
        # Search the top-level module path
        pkg  = None
        path = None  # Make find_module() uses the default search path
    # Find the module; may raise ImportError
    (stream, filename, (suffix, mode, kind)) = imp.find_module(modname, path)
    # Turn it into a code object
    try:
        # Is it Python source code or byte code read from a file?
        # XXX Could handle frozen modules, zip-import modules
        if kind not in (imp.PY_COMPILED, imp.PY_SOURCE):
            # Fall back to built-in reload()
            return reload(mod)
        if kind == imp.PY_SOURCE:
            source = stream.read()
            code = compile(source, filename, "exec")
        else:
            code = marshal.load(stream)
    finally:
        if stream:
            stream.close()
    # Execute the code im a temporary namespace; if this fails, no changes
    tmpns = {}
    exec(code, tmpns)
    # Now we get to the hard part
    oldnames = set(modns)
    newnames = set(tmpns)
    # Add newly introduced names
    for name in newnames - oldnames:
        modns[name] = tmpns[name]
    # Delete names that are no longer current
    for name in oldnames - newnames - set(["__name__"]):
        del modns[name]
    # Now update the rest in place
    for name in oldnames & newnames:
        modns[name] = _update(modns[name], tmpns[name])
    # Done!
    return mod

def _update(oldobj, newobj):
    """Update oldobj, if possible in place, with newobj.

    If oldobj is immutable, this simply returns newobj.

    Args:
      oldobj: the object to be updated
      newobj: the object used as the source for the update

    Returns:
      either oldobj, updated in place, or newobj.
    """
    if type(oldobj) is not type(newobj):
        # Cop-out: if the type changed, give up
        return newobj
    if hasattr(newobj, "__reload_update__"):
        # Provide a hook for updating
        return newobj.__reload_update__(oldobj)
    if isinstance(newobj, types.ClassType):
        return _update_class(oldobj, newobj)
    if isinstance(newobj, types.FunctionType):
        return _update_function(oldobj, newobj)
    if isinstance(newobj, types.MethodType):
        return _update_method(oldobj, newobj)
    # XXX Support class methods, static methods, other decorators
    # Not something we recognize, just give up
    return newobj

def _update_function(oldfunc, newfunc):
    """Update a function object."""
    oldfunc.__doc__ = newfunc.__doc__
    oldfunc.__dict__.update(newfunc.__dict__)
    oldfunc.func_code = newfunc.func_code
    oldfunc.func_defaults = newfunc.func_defaults
    # XXX What else?
    return oldfunc

def _update_method(oldmeth, newmeth):
    """Update a method object."""
    # XXX What if im_func is not a function?
    _update_function(oldmeth.im_func, newmeth.im_func)
    return oldmeth

def _update_class(oldclass, newclass):
    """Update a class object."""
    # XXX What about __slots__?
    olddict = oldclass.__dict__
    newdict = newclass.__dict__
    oldnames = set(olddict)
    newnames = set(newdict)
    for name in newnames - oldnames:
        setattr(oldclass, name, newdict[name])
    for name in oldnames - newnames:
        delattr(oldclass, name)
    for name in oldnames & newnames - set(["__dict__", "__doc__"]):
        setattr(oldclass, name,  newdict[name])
    return oldclass

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)