[Tutor] comment stripper script

Mon Jan 26 00:28:11 EST 2004

On Sun, 25 Jan 2004, kevin parks wrote:

> Hi all. I am hacking together something to strip commented lines from a
> file. I have some programs that generate these files, but the
> application that takes these on input sometimes chokes on these files
> now. So, as a work around, i am filtering these input files with the
> following script, which i wrote, and, well, doesn't work. It is supposed
> to make a new file with just the 'legit' lines in it, but this seems to
> give me the original file with no changes. hmmm...

Hi Kevin,

You can try making the program a little easier to test by breaking out the
comment-detector as a separate function.  At the moment, the program has
an inner loop that does a few things:

###
for line in infile.xreadlines():
    line.strip()
    if not (line.startswith(';') or line.startswith('c') or
        line.startswith('#')):
    f.write(line)
###

If we have a function that can detect commented lines, like
is_commented_line(line),

###
def is_commented_line(line):
    line.strip()
    if not (line.startswith(';') or line.startswith('c') or
            line.startswith('#')):
        return False
    else:
        return True
###

then we can rewrite the loop as:

###
for line in infile.xreadlines():
    if not is_commented_line(line):
        f.write(line)
###

Note that no bugs are fixed yet: we're just shifting code around.  (And
is_commented_line() is buggy: look at Terry's reply for a hint on how to
fix it.  *grin*)

Moving code like this is not useless or aimless:  this reorganization can
help because is_commented_line() only needs to deal with a single line; it
doesn't have to care about files.  And as a side benefit, the code ends up
being easier to test from the interactive interpreter, since you can then
do things like:

###
>>> is_commented_line(' c test')
False
>>> is_commented_line('c test')
True
>>> is_commented_line(' c')
False
###

and see that there's something wacky happening with the comment-line
detection.  It's not completely broken: it is sorta working, except when
there's leading whitespace on the line.

The big idea is to break things down into functions.  By doing so, when
bugs are discovered, tracing and fixing them becomes easier because of the
relative isolation that functions give us.

Hope this helps!