[Tutor] commenting code well

Bruce Sass bsass@freenet.edmonton.ab.ca
Sun, 1 Jul 2001 16:01:59 -0600 (MDT)


On Sun, 1 Jul 2001, Kojo Idrissa wrote:

> I just realized I forgot to send this to the list...it only went to Rob...

Thanks for sending it to the list.

I gave Rob my knee-jerk answer, but wanted to go on 'cause there are
always bits that need commenting on which are related to the program
but kinda OT in the code itself...

<...>
> I guess I like Knuth's idea of Literate Programming.  It appeals to the
> writer in my, but I haven't gotten around to using an actual tools for
> it.  I'm just sort of making up my own version.

...and I think literate programming techniques are the ideal thing for
this task.  Code is code and docs are docs, LP lets you keep them
together and interconvert as is appropriate.

I'd suggest LyX + noweb as a nice, easy to wrap your head around, way
to get a look at LP (unix and win, for sure); but the idea is simple
enough to code up that _anyone_ can play with the source -> prgfiles
conversion (and has, now you know why the message is long :).


> Now, I have a question for the group, related to my verbose commenting
> style.  At what point do lines of commentary begin to impact program
> performance, if at all?
<...>

Hmmm, #'s and TQS's (""", ''') are treated differently, maybe even
depending on where they sit in the code.  Generally, when moving from
.py's to .pyo's, #'s disappear first, docstrings last.  I'm not
worried about it much because, usually, all of my verbose commenting
goes into "doc chunks", which get filtered out when the .py is created.


WARNING: LP style code markup ahead...

...but I have refrained from messing with the order of things,
   so it should be easy to cut'n'paste or edit, if you feel so
   inclined.  Do "notangle /path/to/saved.mes > lpmail.py" if you
   have noweb installed.

History:
I wrote this at the same time Rob was working on his plans for
Useless, triggered by the same discussion (Rob, you may remember an
email from me at that time, this is the first part of what I was
babbling about ;).  I almost turned it into a LP and sent it to the
list (when it started to grow the getops stuff and other bits you
expect a released program to have), but never carried through so it
is unfinished in that respect (at least ;).

Note:
I wrote the core in short order (in a linear, go-with-the-flow kinda
style), added a few bells and whistles, but got side-tracked checking
out the os.system and os.exec* stuff (and then a whole bunch of other
things) so it didn't get a cleanup.  The code is tested (I read the
contents of /usr/bin/lpmail into the message and added the markup),
and I have it hooked up as a display filter for Pine (keying on
"lpmail: " in column 1 of a line).

In fact, I need this line to trigger the thing...
lpmail: lpmail.py

Known Bugs:
A single "@" in column 1 will always start a doc chunk,
there is no "@@" work-around like with noweb.

I hope the code speaks for itself.

<<*>>=
#!/usr/bin/env python

"""Tangle an lpmail message or noweb style LP file.

LPMail will extract a specific code chunk from a noweb formatted literate
program, or from an LP file containing an "lpmail:" header which lists which
chunks to extract, a filename for the default (named "*") chunk, or both.

See "lpmail -h" for use information.

"""

import sys, os, getopt, re, string
BaseName = os.path.basename


# some file specific information
PRGNAME = BaseName(sys.argv[0])
VERSION = "0.0.0alpha0"
LASTUPDATE = "Feb 14, 2001"
AUTHOR = [("Bruce Sass",
               "bsass@freenet.edmonton.ab.ca",
               "Primary author")]
CONTRIBUTOR = [("Python Tutor Mailing List",
                    "tutor@python.org",
                    "many bits and pieces")]


# -- system stuff
# ---------------
# Default buffer size for I/O
DEFBUFSIZE = -1  # -1 = use system defaults

# permitted process creation methods
EXEC_types = ["execv", "system"]
# default method used to fire up the IDE,
EXEC_method = "system"

# available development environments
#  - maybe autogenerate, eh.
IDE_types = ["emacs", "gnuclient", "idle", "jed", "xemacs", "xjed"]
# where the ide_command executable is located,  :-/
IDE_EXEC_PREFIX = "/usr/bin/"
# default IDE command string, "false" for no command
IDE_command = ""


# -- program-system interactions
# ------------------------------
# flags
no_fsWriting = 0        # "true" redirects all filesystem writes to stdout
in_Prg_Chunk = 0        # "true" if we are in a program chunk
do_Default_Out = 1      # "true" if default chunk should be sent to stdout
be_Quiet = 1            # "true" supresses all messages except -h and -v.
IDE_in_background = 0   # "true" runs the IDE in the background,
                        #   probably what you want for a GUI system

# just in case it is necessary to limit the size of these...
r_BufSize = DEFBUFSIZE    # read buffer
w_BufSize = DEFBUFSIZE    # write buffer


# -- program stuff
# ----------------
# regular expressions
pseudo_header = re.compile("^lpmail:(.*)")
prg_chunk_header = re.compile("^<<.*>>=$")
doc_chunk_header = re.compile("^@$")
embedded_chunk_header = re.compile('^(\s*)(<<)(.*)(>>)$')
first_word_template = re.compile('^(\S+)( *)(.*)')

# variables
chunks = {}         # {name:[<start,end,lines[],size()>,...],...}
linecount = 0       # where we are in the source file
header_args = []    # list of pseudo-header arguments
rootname = ''       # chunk name passed with the -R (--Root) option

# some definitions
def usage():
    print """
    Synopsis:
	lpmail [options] sourcefilename

    Options:
	-h, --help
            display this message
        --default-out=<0|1>
            flag if default chunk should be output;
	    1 is yes, 0 is no (default 1)
        -I, --IDE=<IDE command>
            specify IDE command (no default)
	--background-IDE=<0|1>
	    0 means the IDE will run run in the foreground,
	    1 for in the background (default 0)
	-R <chunkname>, --Root=<chunkname>
            extract "chunkname" to stdout
	-s, --stdout
            only output to stdout
	-v, --version
            display program information
    """
    print "    Recognized IDEs:", string.join(IDE_types)
    print """
    How it works:
	- lpmail extracts all the code chunks
	- if the -R option has not been used and a default chunk exists
	  (named "*"), it is sent to stdout
	- if the  -R option has been used,
	    - its argument is the root chunk's name
	  otherwise
	    - all chunks named in the lpmail header are root chunks
	      (names without a chunk get the default chunk, if it exists,
	      otherwise a message is sent to stderr)
        - root chunks are sent to a like named file or stdout
        - only if an IDE is requested, writing to the file system is allowed,
          and there are names in the lpmail header, is the IDE is fired up.

    """

def printcontacts(contactlist, header=""):
    print "%s: %s <%s>" % (header, contactlist[0])
    for name, emailaddr, description in contactlist[1:]:
        print ", %s <%s> - %s" % (name, emailaddr, description),
    print "."

def version():
    """Display program information."""
    print PRGNAME + "-" + str(VERSION)
    print "Updated:", LASTUPDATE
    printcontacts(AUTHOR, "Author(s)")
    printcontacts(CONTRIBUTOR, "Contributor(s)")

def fatalerror(detail=None):
    """Look at an exception, then exit."""
    if detail and not be_Quiet:
    	sys.stderr.write(PRGNAME + ": " + str(detail) + "\n")
    sys.exit(1)

class PChunk:
    """A program chunk...

    ...has the following attributes and methods:
	start      - source file line number the chunk header is on
	end        - line number of chunk terminating "@" symbol
	lines[]    - one string for each line between "<<...>>=" and "@"
	size()     - returns end - start - 1
        complete() - returns 1 if the chunk is complete, otherwise 0.

    Note: A chunk is incomplete if either the "start" or "end" are None,
          or the self.size() does not agree with len(self.lines).

    """

    def __init__(self, chunkstart=None, chunklang=""):
	"""Define "start", "end", number of "lines" and "language"."""
	self.start = chunkstart
	self.end = None
	self.lines = []
        self.language = chunklang

    def __len__(self):
        """Number of lines in the chunk."""
        return len(self.lines)

    def __str__(self):
        """Concatenate and return the lines in the chunk."""
        return string.join(self.lines)

    __repr__ = __str__

    def size(self):
	"""Convenience function for (x.end - x.start - 1)."""
	return self.end - self.start - 1

    def complete():
        if self.size() == self.__len__():
            return 1
        return 0

@

Just to keep it on the topic of commenting in programs...

I would say the following comment is OK, it distinguishes between
major sections in the source:

# -- start doing something...
# ---------------------------

This next  guy is redundant, and should be replaced with a big long
philosophical discussion of why the command line for this program is
structured the way it is... of course it should never appear in the
actual program file, imo.

# ...the command line

You all are getting the hang of the markup, eh?

<<*>>=
try:
    opts, args = getopt.getopt(sys.argv[1:], "hI:R:sv",
	    ["help", "IDE=", "Root=", "stdout", "version",
             "exec-method=", "default-out=", "background-IDE="])
except getopt.GetoptError, detail:
    if not be_Quiet:
        sys.stderr.write(PRGNAME + str(detail) + "\n")
        usage()
    sys.exit(2)

for o, a in opts:
    if o in ("-h", "--help"):
        usage()
        sys.exit()

    elif o in ("-I", "--IDE"):
	try:
	    if BaseName(first_word_template.match(a).group(1)) in IDE_types:
	    	IDE_command = a
	except:
	    pass

    elif o in ("-R", "--Root"):
	rootname = a
	do_Default_Out = 0

    elif o in ("-s", "--stdout"):
        no_fsWriting = 1

    elif o in ("--default-out",):
        if a:
            do_Default_Out = 1
        else:
            do_Default_Out = 0

    elif o in ("-v", "--version"):
        version()
        sys.exit()

    elif o in ("--exec-method",):
	if a in EXEC_types:
	    EXEC_method = a

    elif o in ("--background-IDE",):
	if a:
	    IDE_in_background = 1
	else:
	    IDE_in_background = 0


# ...handle some show stoppers
if args:
    SRCFILE = args[0]
else:
    if not be_Quiet:
        sys.stderr.write(PRGNAME + ": not enough arguments!!\n")
        usage()
    sys.exit(2)

try:
    lpfile = open(SRCFILE, 'r', r_BufSize)
except IOError, detail:
    fatalerror(detail)

@

# ...figure out what we have in the way of chunks
# The stuff in the try-except else clause could be part of the
# __init__ of class LPFile; a collection of LPFile instances
# would make up an LPDoc [probably where I'm headed next!].

<<*>>=
while 1:
    try:
    	line = lpfile.readline()
    except IOError, detail:
        if not be_Quiet:
            sys.stderr.write(PRGNAME + ": failed reading a line!!\n")
    	lpfile.close()
    	fatalerror(detail)
    else:
	# hmmm, extracting a bunch of independent groups...
	# ...is that a candidate for "threaded" code?
	linecount = linecount + 1

    	if prg_chunk_header.match(line):
	    in_Prg_Chunk = 1
	    name = line[2:-4]
	    if chunks.has_key(name):
		chunks[name].append(PChunk(linecount))
	    else:
	    	chunks[name] = [PChunk(linecount)]

	elif doc_chunk_header.match(line):
	    if in_Prg_Chunk:
		chunks[name][len(chunks[name])-1].end = linecount
		in_Prg_Chunk = 0

	elif pseudo_header.match(line) and not header_args:
	    header_args = string.split(pseudo_header.match(line).group(1))

	elif line == '':
	    break

	elif in_Prg_Chunk:
	    chunks[name][len(chunks[name])-1].lines.append(line)


# ...sanity check time
if in_Prg_Chunk:
    if not be_Quiet:
        sys.stderr.write(PRGNAME + ": EOF before program chunk finished!\n")
    chunks[name][len(chunks[name])-1].end = linecount
    in_Prg_Chunk = 0

if not chunks:
    fatalerror("Didn't find any chunks of code!!")

lpfile.close()

@

# we have a bunch of chunks stashed away by name in a dictionary,
# and we may or may not have a list of files in "filenames"; some of the
# stuff that can be done (in way of  fixes, extensions, projects, ...):
#  - fix up PChunk, so you can: str(), "+", etc.
#  - extend to keep track of doc chunks (DChunk)
#    - keep track of doc chunk text types (plain, SGML, LaTeX, TROFF, etc.)
#  - rewrite in a less linear manner
#  - implement a stream mode(s)
#  - use a proper class based error handling subsystem
#  - figure out which chunks to include in the text and which to externalize
#    when pretty printing or archiving
#  - develop pretty printing routines
#  - figure out how to handle incomplete chunks automatically,
#    so you can email pieces of a program around
#    -- a prog that works with LPDoc-s
#  - develop an LPDoc <--> RCS/CVS/etc. interface
#  - develop an LPDoc editor
#  - add LPMail/LPDoc support to IDLE
#  - develop a complete set of Literate Programming tools

<<*>>=
#  - generate all files represented in "chunks"
def writechunklines(chunklist, file, istring):
    """Write the lines of a chunk to a file"""
    for chunk in chunklist:
	for line in chunk.lines:
	    if embedded_chunk_header.match(line):
	    	indent, chunkname = embedded_chunk_header.match(line).group(1, 3)
		writechunklines(chunks[chunkname], file, indent + istring)
	    else:
		file.write(istring + line)

def chunkout(chunklist, filename):
    """Output a chunk sequence to a file or stdout."""
    try:
	if filename == '*stdout*' or no_fsWriting:
	    file = sys.stdout
	else:
	    file = open(filename, 'w', w_BufSize)
    except IOError, detail:
    	fatalerror(detail)
    else:
	writechunklines(chunklist, file, '')
	# this feels like a hack, is there a better way?
	if not (filename == '*stdout*' or no_fsWriting):
	    file.close()


if chunks.has_key('*') and do_Default_Out:
    chunkout(chunks['*'], '*stdout*')

@

# If we have a "rootname", then the "-R" option was given and there is no
# need to look at the other chunks; otherwise the chunks listed in filenames
# are output.  This 'process everything, then see what should be done'
# scheme is ok for small LPs, but too inefficient if a small chunk is being
# extracted from a larger work.

<<*>>=
if rootname:
    try:
    	chunkout(chunks[rootname], '*stdout*')
    except NameError, detail:
        if not be_Quiet:
            sys.stderr.write(PRGNAME + "Can't process the root chunk!: " + rootname + "\n")
            sys.stderr.write(PRGNAME + "error: " + str(detail) + "\n")
else:
    for name in header_args:
    	if chunks.has_key(name) and name != "*":
    	    chunkout(chunks[name], name)
    	elif chunks.has_key('*'):
    	    chunkout(chunks['*'], name)
	else:
            if not be_Quiet:
                sys.stderr.write(PRGNAME + "Couldn't find a chunk called: " + name +"!\n")

@

# now that files are available, some possibilities are:
#    - score incoming progs based on their performance under rexec,
#      in a chroot jail :)
#      - append original file with script "scores" and resend, archive, etc.

<<*>>=
#    - fire up IDLE, EMACS, etc.
if IDE_command and EXEC_method and header_args and not no_fsWriting:
    # this is used as the last option for some exec methods,
    #   - a unix hack?
    IDE_bkgnding_string = ">/dev/null 2>&1 &"
    try:
    	IDE_args = [IDE_command] + header_args.remove('*')
    except ValueError, detail:
        #print "--|" + str(header_args) + "|--.remove('*') raised a ValueError:", detail
	IDE_args = [IDE_command] + header_args

    #print "IDE_{EXEC_PREFIX, args, bkgnding_string}:",
    #print IDE_EXEC_PREFIX, IDE_args, IDE_bkgnding_string

    rcode = None
    if EXEC_method == "system":
	if IDE_in_background:
	    IDE_args.append(IDE_bkgnding_string)
    	#print "--|" + IDE_EXEC_PREFIX + string.join(IDE_args) + "|--"
    	rcode = os.system(IDE_EXEC_PREFIX + string.join(IDE_args))
    elif EXEC_method == "execv":
    	IDE_args[0], opts = first_word_template.match(IDE_args[0]).group(1,3)
    	IDE_args.insert(1, opts)
    	print "--|" + IDE_EXEC_PREFIX + IDE_args[0] + "|-- :", IDE_args
    	#rcode = os.execv(IDE_EXEC_PREFIX + ide_args[0], ide_args)

    #print "IDE finished with return code:", rcode
@

Have fun.


- Bruce