[Tutor] telling dir(_builtins_) from dir(__builtins__) [spellchecking Python?]

dyoo@decrem.com dyoo@decrem.com
Sun, 17 Mar 2002 21:21:39 -0800 (PST)


> As I repeatedly remark in my book - computers are stupid.  You have to
> be specific and precise. These are the irritations of every programming
> language, sorry, but that's just how it is.

It's actually something of a tradeoff between making the interpreter
efficient and making it more "nice" for people.  We could imagine a small
spelling-checker built into the Python interpreter that might check up on
a NameError and say something like:

"""Namerror: global name 'foo' is not defined.  Perhaps you mean 'Foo'?"""

IBM's 'jikes' Java compiler, for example, actually does this kind of
checking on the compilation step.  It might be good to expect this sort of
helpfulness from the interactive interpreter too, especially because time
isn't so much an issue here.



> Eventually somebody might figure out how to make smarter interpreters,
> until then we're stuck with it.

That sounds like a challenge.  *grin*

For fun, let's see how much work it might take to add such a spell
checking feature to Python's interactive interpreter.  How would we start?



Python has a function called 'difflib' which tells us how "closely"  
strings match up to each other --- it measures the relative difference
between strings.

    http://www.python.org/doc/lib/module-difflib.html

We can cook up a very quick spellchecker by using 'difflib':

###
def spellcheck(misspelled, candidates):
    """Given a misspelled word, returns the best choices among a list
    of candidates.

    At the moment, this is just a call to difflib's
    get_close_matches() function, but we might want to use a more
    powerful spell checker when we have the chance."""
    return difflib.get_close_matches(misspelled, candidates)
###

Let's take a look:

###
>>> spellcheck('grob', ['glob', 'croak', 'foo'])
['glob']
>>> spellcheck('_builtins_', dir())
['__builtins__']
###

Ok, that step wasn't so bad.  *grin*



Every exception carries with it a 'stack frame' --- it's this "frame" that
contains all the names and values that a program uses to look up variable
values.

    http://www.python.org/doc/current/ref/execframes.html


These frames are accessible if we use the sys.exc_info() function.  For
example:

###
>>> def oops():
...     print _builtins_
... 
>>> try:
...     oops()
... except:
...     frame = sys.exc_info()[2].tb_frame
... 
>>> frame
<frame object at 0x819777c>
>>> dir(frame)
['__class__', '__delattr__', '__getattribute__', '__hash__', '__init__', 
'__new__', '__reduce__', '__repr__', '__setattr__', '__str__', 'f_back', 
'f_builtins', 'f_code', 'f_exc_traceback', 'f_exc_type', 'f_exc_value', 
'f_globals', 'f_lasti', 'f_lineno', 'f_locals', 'f_restricted', 'f_trace'] 
###

So once we see an NameError exception in action, we can pull out a frame,
and stare all all the possible names we can use... and that's where we can
pull out a candidate list of names to spell check against!



But how can we tie this into our programs, so that a NameError will
provoke our spellchecking system into action?  One possible solution is to
put our code within an exception-handling fence.  Here's a function that
will do just that:


###
def protect(function):
    def spell_check_wrapper(*args, **kwargs):
        try:
            return function(*args, **kwargs)
        except NameError:
            type, name_error, tb = sys.exc_info()
            misspelled = getNameFromNameError(name_error)
            choices = (tb.tb_frame.f_locals.keys()
                       + tb.tb_frame.f_globals.keys())
            corrections = map(repr, spellcheck(misspelled, choices))
            if corrections:
                msg = (str(name_error)
                       + ".  Perhaps you meant one of the following: "
                       + string.join(corrections, ','))
            else:
                msg = str(name_error)
            raise NameError, msg
    return spell_check_wrapper


def getNameFromNameError(name_error):
    """Extracts the 'name' from a NameError.  Dunno if this is a
    method of a NameError instance."""
    return str(name_error).split("'")[1]
###



The code is a bit ugly, but what it does is fairly simple: whenever a
NameError occurs, it tries to intercept and see if any variable names are
already defined that are similar in spelling.  If such a correction is
possible, it'll try to improve the error message.


Does this work?

###
>>> oops_protected = protect(oops)
>>> oops_protected()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/tmp/python-1037YCW", line 51, in spell_check_wrapper
NameError: global name '_builtins_' is not defined.  Perhaps you meant one 
of the following: '__builtins__'
###

*grin* Cool!


If someone is sufficiently interested, perhaps we can tie this into the
IDLE environment, since this seems like something that would be useful to
folks.


Hope this helps!