Request for feedback on my first Python program

Fri May 30 18:01:03 EDT 2003

On Thu, 29 May 2003 23:14:45 -0700, Scott Meyers <Usenet at aristeia.com> wrote:
*the* Scott Meyers? ;-)
You should take the following with a grain of salt. There are a number of
Python book authors who post here, Alex Martelli being one of the most
prolific and authoritative (as you may have noticed if you've been lurking a while ;-)
And then there are the developers and overlapping sets of the afore.

>I'm a C++ programmer who's writing his very first Python program.  This
>means the program is going to be gross, and I apologize for that in
>advance.  I don't really have anybody I can show it to for feedback, so I'm
>hoping I can get some comments here.  If there is a better place for me to
>seek guidance, please let me know.
>
>The program is supposed to read a file containing directory and file names,
>one per line.  (The file can also contain comment and blank lines, which
>should be ignored.)  For each file or directory name, the program should
>print out whether it's a directory, a file, or neither.  That's it.
>
>Here's the code; you may want to hold your nose:
>
>  import sys
>  import os
if you want to access a few names unqualified, you can use an alternative import format, e.g.,

   from os.path import isdir, isfile, basename

this will let you write isdir(xxx) instead of os.path.isdir(xxx), and so forth.

>  import string
Unless you are trying to be 'way backwards compatible, it will be unusual
to need to import the string module. The builtin str class is the basis of
most strings, and its methods are accessible via the bound method notation
of <string expression>.method. E.g., 'abc'.upper() or sometimes with an argument.

Interactively, help(str) will show you. Help(xxx) or help('xxx') will often
get you what you need to know about xxx, especially if the code author has
been good about doc strings (of which more mention below, but see
http://www.python.org/peps/pep-0257.html for extensive info).

   import sys, os
works too, though your separate-line style above is preferred style (see PEP 8
at http://www.python.org/peps/pep-0008.html).
>  
>  # Error status codes
>  BadInvocation = 1
There are a few conventions re spelling. First-capital and joined-capitalized is
often used for a class name. Constant "declarations" are often all caps, sometimes
with underscore word ligation, e.g., BAD_INVOCATION = 1

>  
>  def usage(programName):
The first line position inside a function def has special magic. If you
put a string literal there (and it may continue to several lines), it
will become the documentation string for the function, and will show up
in interactive help and also be accessible programatically for whatever purpose, e.g.,
     """Prints usage info and exits."""

Triple quotes allow quoting unescaped single or double quotes (except at end where
there would be ambiguity) and also newlines, allowing multiple line doc strings, but
they're the convention even for single line doc strings.

>    print "Usage: " + os.path.basename(programName) + " [file]"
Concatenating strings with '+' is fine for small expressions, but it is an inefficient
way to accumulate a long string. See FAQs. String formatting very close to C-style printf
formatting is avaiable with the '%' operator. Thus an alternative way to print the above is
     print "Usage: %s [file]" % os.path.basename(programName)

BTW, the right hand argument to % above is short-hand for a length 1 tuple, so the safe
way to write it is
     print "Usage: %s [file]" % (os.path.basename(programName),)

BTW2, without the trailing comma after a single element, it would just be a parenthesized expression,
not a tuple.

The way you wrote it is nice and clear, though, so it wins.

>    sys.exit(BadInvocation)
this can also be spelled
     raise SystemExit, BadInvocation
or
     raise SystemExit(BAD_INVOCATION)

without having to import sys
>  
>  # Take a list of strings, return an equivalent list where the following
>  # strings have been removed:
>  # - those with no non-whitespace characters
>  # - those whose first non-whitespace character is a "#"
Since you're writing this, you might as well make a docstring out of it that will be
seen by the help feature and pydoc.

>  def getMeaningfulLines(lines):
       """
       Take a list of strings, return an equivalent list where the following
       strings have been removed:
       those with no non-whitespace characters
       those whose first non-whitespace character is a "#"
       """
       return [line for line in lines if line.strip() and not line.lstrip().startswith('#')]

The above single line can replace the following 9, explained below
>    nonCommentLines = []
>    for i in range(0,len(lines)):
>      try:
>        firstToken = string.split(lines[i])[0]
>        if firstToken[0] != "#": 
>          nonCommentLines.append(lines[i])
>      except IndexError:
>        continue
>    return nonCommentLines
The oneliner, using a list comprehension, again is:
     return [line for line in lines if line.strip() and not line.lstrip().startswith('#')]

in the above,
(1)   [line for line in lines <condition>] builds a new list from the lines list,
      but including only those that satisfy the condition, which is a short-circuited
      logical expression
(2a)  if line.lstrip() tests that the line is not a zero length string '' after removing white space
      from the beginning and end. '' and a number of other "empty" things, e.g., (), [], {}, have
      False value when tested with if, which makes some expressions concise.
(2b)  and not line.lstrip().beginswith('#') tests that the first character of the line after whitespace
      has been stripped from the left does not begin with '#'

Putting too much into one line can of course get Perlish, but once you are familiar with list
comprehensions, the above is pretty straightforward. For multi-megabyte strings, you may want
to write something that works more like a chain of pipes, so as to avoid huge buffering.
>  
>  
>  # if this language had a main(), it'd be here...
No reason you can't have it. It's a common idiom.
def main():
>  if len(sys.argv) != 2: usage(sys.argv[0])
>  
>  lines = getMeaningfulLines(open(sys.argv[1]).readlines())
>  for i in lines:
>    print i + " is a ",
>    if os.path.isdir(i): print "directory"
>    elif os.path.isfile(i): print "file"
>    else: print "non-directory and non-file"

However, with the above def main(), executing the file as a script will
only execute the _definition_ of the main() function, without calling it.
Sort of like compiling the contructor for an executable object and executing
the constructor, but not calling the resulting callable object.
(Note the important fact that a definition is compiled to produce an executable
_defintion_, which is dynamically executed according to its context like any
other statement. Here it would get executed if we imported this program or if
we ran it with the interpreter, and at the end, the executing of the _definition_
def main(): ... would leave main locally bound to the actual function, ready to
be called, but not called. Thus you could write

    if foo:
        def bar(): print "bar executed"
    else:
        def baz(): print 'baz executed' # (single quotes are usual, unless same is being quoted).

and they would not both be bound to their respective names until this had been executed at least
twice, with different logical values of foo.

Ok, back to getting main's function to be called, as opposed just to having the name bound to
the ready-to-call function.

We could write
main()
at the same level of indentation as def main():...
This would execute the code right after executing the def. Useful for postponing execution
until all the def's of potentially interdependent functions have been executed, so that
they're all defined before trying to run them.

But if you want your file to serve as an importable module as well as a runnable script, the
typical idiom is to make the execution of main (or test, or whatever) dependent on the environment
in which it is being executed. When run from the command line, the global name __name__ will have
a value of '__main__' (only coincidental convention relates it to our main() function's name, BTW),
whereas if you import, the importing process will cause __name__ to have the name being imported.
Thus we can write

if __name__ == '__main__':
    main()

And since this only gets executed when run, typically interactively, it is common to put usage printing
below this if also. That way it's left out of the imported module's namespace when imported.

>
>In addition to the numerious stylistic gaffes I'm sure I've made, the
>program also doesn't work correctly.  Given this input file,
>
>  d:\
>  d:\temp\foo.py
>
>I get this output:
>
>  d:\
>   is a  non-directory and non-file
>  d:\temp\foo.py
>   is a  non-directory and non-file
>
>Aside from being ugly (how do I get rid of the newline that follows each
>directory or file name?), the problem is that the first entry IS a
>directory and the second one IS a file.  So clearly I'm doing something
>wrong.  Any idea what it is?
The newline is being passed to isdir and isfile, and neither see such strings
as valid file paths, so getting rid of the white space on both sides of the
path string should fix it. That is what str.strip is for.
 >>> str.strip
 <method 'strip' of 'str' objects>

You can invoke it like
 >>> str.strip('  foo bar    ')
 'foo bar'

or you can invole it as a bound method implicitly operating on the string instance it's bound to
 >>> '  foo bar    '.strip()
 'foo bar'

So we'll use that

>  lines = getMeaningfulLines(open(sys.argv[1]).readlines())
>  for i in lines:
     i = i.strip()
>    print i + " is a ",
>    if os.path.isdir(i): print "directory"
>    elif os.path.isfile(i): print "file"
>    else: print "non-directory and non-file"
Note that os.path. can be left off in the above if the from ... import was used.

>
>Thanks very much in advance.
You're welcome. Hope I haven't misled. Someone will jump in to clarify if so, I'm sure.
Perhaps I better check on what's been posted since last night when I started this ;-)

To summarize, I made the mods, and a few more, including allowing "-" as file name for
stdin. If you run python interactively and import dfreport, and then type help(dfreport)
you can see the effect of doc strings. I like to catch all the exceptions, to end neatly.
As an afterthought, I added an optional trailing '-rrx' option for re-raise-exception, so
a traceback will be printed.

====< dfreport.py >=========================================
"""
This module file may be run as a script or be imported.
Run as a script it takes a file name argument, and prints a report
reflecting dir, file, and other references of the file's non-blank,
non-comment lines. Run without arguments for this plus usage.

Importing makes two functions available (getMeaningfulLines,
and getDirFileReportLines), which can be used programmatically.
See below in interactive help output for this module.
"""
import sys
from os.path import isdir, isfile, basename

def getMeaningfulLines(lines):
    """
    Take a list of strings, return an equivalent list where the following
    strings have been removed:
    - those with no non-whitespace characters
    - those whose first non-whitespace character is a "#"
    """
    return [line for line in lines
                if line.strip() and not line.lstrip().startswith('#')]

def getDirFileReportLines(filepath):
    r"""
    For each non-whitespace, non-#-headed line in file filepath
    (or stdin if filepath is a single minus character)
    return in a list a corresponding report line with
    " is a <kind>\n" appended to the original line stripped
    of enclosing whitespace, where <kind> may be:
      - directory 
      - file 
      - non-directory and non-file
    reflecting what is found in the os's file system.
    """
    reportLines = []
    if filepath=='-': fp = sys.stdin
    else: fp = file(filepath)
    for line in getMeaningfulLines(fp.readlines()):
        line = line.strip()
        if isdir(line): kind = "directory"
        elif isfile(line): kind = "file"
        else: kind = "non-directory and non-file"
        reportLines.append('%s is a %s\n' % (line, kind))
    return reportLines

def main(argv):
    """Invoked only when module is run as script"""

    # Error status codes
    BAD_INVOCATION = 1

    def usage(programName, exitcode=BAD_INVOCATION):
        """Prints module doc string plus usage info and exits."""
        print "Usage: [python] " + basename(programName) + " [-h | file | -]"
        print "    '-h' for further info, '-' to use stdin as input file"
        print "    NB: Some windows versions require [python] explicitly for IO redirection"
        raise SystemExit, exitcode

    try:
        argc = len(argv)
        if argc==1 or argc==2 and argv[1]=='-h':
            print __doc__; usage(argv[0])
        elif argc==2:
            print ''.join(getDirFileReportLines(argv[1]))
        else:
            raise UserWarning, 'Bad Usage: "%s"'%' '.join(argv)
    except SystemExit: raise    # pass exits through
    except Exception, e:
        # print the name and message of any standard exception before usage
        print '%s: %s' % (e.__class__.__name__, e)
        if isinstance(e, IOError): usage(argv[0], e.errno) # pass IO errno's
        else: usage(argv[0])
    except:
        print 'Nonstandard Exception %r: %r' % sys.exc_info()[:2]
        usage(argv[0])

if __name__ == '__main__':
    main(sys.argv)  
============================================================

Piping test lines from the console via cat:

[14:50] C:\pywk\clp>cat |python dfreport.py -
# comment
    # ws-prefixed comment, and next line is some spaces

   #last had spaces
zdir
dfreport.py
fnord
^Z
zdir is a directory
dfreport.py is a file
fnord is a non-directory and non-file

It should also run as originally intended.

[14:52] C:\pywk\clp>dfreport.py blah
IOError: [Errno 2] No such file or directory: 'blah'
Usage: [python] dfreport.py [-h | file | -]
    '-h' for further info, '-' to use stdin as input file
    NB: Some windows versions require [python] explicitly for IO redirection

if blah is a file with useful content ;-)

And importing as a module, you can access defined functions, e.g.,
(note help access to doc strings, try help(dfreport) for whole module docs):

 >>> import dfreport
 >>> help(dfreport.getMeaningfulLines)
 Help on function getMeaningfulLines in module dfreport:

 getMeaningfulLines(lines)
     Take a list of strings, return an equivalent list where the following
     strings have been removed:
     - those with no non-whitespace characters
     - those whose first non-whitespace character is a "#"

 >>> dfreport.getMeaningfulLines([
 ... '# not this one',
 ... '   #nor this one',
 ... '    ',
 ... 'this one should be first',
 ... '  white space is preserved  '])
 ['this one should be first', '  white space is preserved  ']

HTH

Regards,
Bengt Richter