What I learned from the Cookbook (was...)

Mon Oct 14 09:43:35 EDT 2002

In my first post on this thread I said I wouldn’t embarrass myself by
telling all I learned in the first chapter.  In this post I’ll tell instead
what I discovered for myself after being inspired by the cookbook.

Recipe 1.6, dispatching using a dictionary, discusses using a dictionary in
lieu of, say, a C++ switch statement.  Inspired by the Cookbook, I recast
the main loop of my app’s colorizer as follows:

  while i < sLen:
    func = state_dict[state]
    i,state = func(s,i)

This is a whole lot clearer, and probably faster than the old code. However,
it probably wouldn’t have been interesting enough to do if I hadn’t been
thinking of something more.

I was thinking of the loop above in vague way, without having written it,
and I was musing about state_dict as a (dynamic) data structure.  I suddenly
thought that Python allows us to _compute_ switch statements.  In fact,
this isn’t really correct: without heroic efforts we can’t really compute
the contents of state_dict.  That is, it wouldn’t be worth the effort to
generate the state handlers themselves on the fly.

But the intuition behind this first (slightly erroneous) thought is correct.
We can compute _other_ data structures (dictionaries or lists) used by the
state handlers.  The effect is to compute what the state handlers do.
Moreover, all the special case code that used to be in my app’s state
handlers can be moved into the computation of the dicts and lists used by
the state handlers.

Of course, this isn’t really anything new: data-driven code should be a
staple of any programmer’s repertoire. But Python makes it so much more
natural and easy to do.  For example, the jEdit editor contains XML files
describing how to syntax color almost 70 different languages.  I’d like my
app to be able to use those descriptions. Here is a very brief description
of what I plan to do.

The known_languages dict will contain all the data structures needed to
handle a particular language.  This dict is created dynamically.  Its keys
are the names of each language; its values are tuples of other data
structures.  If known_languages were created statically it would look
something like:

 known_languages = {
  "c"  : (d1,...,dn),
  "python" : (d1,...,dn),
  ... }

The members of each tuple, i.e., d1,...,dn are created by the initialization
routine, something like this:

# Get the data structures for language, the language to be colored.
data = known_languages.get(language)
if not data:
  # Parse the description file, creating the data structures.
  data = parse_xml_file(language)
  if data: known_languages[language] = data
if data:
  # unpack the data for convenience of the state handlers.
  d1,...,dn=data

Actually, most of the methods and data would be members of the colorizing
class, and I’ve omitted all the "self" prefixes for clarity.

This is about as elegant description of the problem as can be imagined.  It
rivals mathematical notation in its conciseness, power and generality, while
being real Python code, except for the ellipses.  It doesn’t get much better
than this.

Edward

P.S.  In theory, this scheme could be transliterated into C or C++.  In
practice, few C++ programmers would discover it: the implementation details
would obscure everything, especially the thoughts that lead to the scheme in
the first place.
EKR

--------------------------------------------------------------------
Edward K. Ream   email:  edream at tds.net
Leo: Literate Editor with Outlines
Leo: http://personalpages.tds.net/~edream/front.html
--------------------------------------------------------------------