seeking to improve Python skills

Jean-Paul Calderone exarkun at divmod.com
Fri Jan 23 08:34:18 EST 2009


On Fri, 23 Jan 2009 12:17:41 +0100, mk <mrkafk at gmail.com> wrote:
>Hello everyone,
>
>I wrote the following program mainly for educational purpose of improving my 
>Python programming skills -- would Pythonistas here please look at it and 
>point out areas that could use improvement?
>
>This thing analyzes the stored load average record file, calculates simple 
>moving average and produces Gnuplot program to plot the thing.
>
>Constructive criticism is welcome.
>
>#!/usr/bin/python
>
>import re
>import sys
>import tempfile
>import subprocess
>import os
>
>class MovingAvg(object):

You should have a docstring here.  It should describe the overall purpose
of the MovingAvg class and it should probably document the meaning of each
attribute instances of the class have.

>         def __init__(self, fname):
>                 try:
>                         self.fo = open(fname)
>                 except IOError, e:
>                         print "Problem with opening file:", e
>                         sys.exit(4)

This isn't very good.  Your MovingAvg class is otherwise relatively general-
purpose and re-usable, but this makes it unusable instead.  You shouldn't
use print to report errors in general code and you shouldn't turn specific
exceptions into SystemExit.  This kind of whole-program error handling
belongs in a different layer if your code.  Your moving average code should
just focus on computing a moving average.

Also, filenames are rather clunky.  Another improvement here would be to
pass in a file object which the calling code has already opened.  Or, better
still, just pass in an iterator of data.  Do the file handling and the
parsing in a different layer.  Then you'll be able to re-use your moving
average class with data files in a different format - all you'll have to do
is write another parser for the new format.

>                 self.reslist = []
>                 self.smalist = []
>                 self.maxval = 0
>
>         def extrfromfile(self):

There should be a docstring here describing what this method does.

>                 vre = re.compile("(\d+-\d+-\d+) (\d\d:\d\d) (\d+\.\d+)")
>                 for line in self.fo:
>                         res = vre.search(line)
>                         if res:
>                                 self.reslist.append({'day':res.group(1),\
>                                 'time':res.group(2),\
>                                 'val':float(res.group(3))})

The trailing backslashes on the previous lines are superfluous.  The code
is just as valid without them.

>         def calc_sma(self, smalen=4):

Another missing docstring.  Make sure you also document the meaning of the
parameter the method accepts.

>                 if smalen == 0:
>                         raise AssertionError, "Moving Average sample length 
>cannot be 0"
>                 if not isinstance(smalen, int):
>                         raise AssertionError, "Moving Average sample length 
>has to be integer"

The conventional way to write the previous lines would be:

    assert smallen != 0, "Moving Average sample length cannot be 0"
    assert isinstance(smallen, int), "Moving Average Sample length has to be integer"

However, I would just leave them out.  If you document the meaning of the
smalen parameter, then callers will know it can't be 0 and must be an int.

>                 total = 0

Superfluous line above.

>                 total = sum( [ x['val'] for x in self.reslist[0:smalen] ] )
>                 sma = total / smalen
>                 smaidx = int(smalen/2)
>                 self.smalist.append((self.reslist[0]['day'],\
>                         self.reslist[0]['time'],\
>                         self.reslist[0]['val'],\
>                         self.reslist[0]['val']))

Superfluous backslashes again.

>                 for i in range(smalen, len(self.reslist)):
>                         curval = self.reslist[i]['val']
>                         self.maxval = max(self.maxval, curval)
>                         total += curval
>                         total -= self.reslist[i - smalen]['val']
>                         sma = total / smalen
>                         smaidx += 1
>                         self.smalist.append((self.reslist[smaidx]['day'],\
>                                 self.reslist[smaidx]['time'],\
>                                 self.reslist[smaidx]['val'],\
>                                 sma))

And again.

>
>         def return_results(self):

Missing docstring.  

>                 return (self.reslist, self.smalist, self.maxval)

Generally, I wonder whether MovingAvg should just be a function rather
than a class.  The only place it maintains state it does so confusingly.
I would just make the input data a parameter to calc_sma and have it return
the results it computes.  Drop the rest of the class and make calc_sma a free
function.

>class GnuplotWrapper(object):

Docstring.

>         def __init__(self, smalist, maxval, outfname = "calc.png", 
>graphlen=640, graphheight=480, gnuplot = '/usr/bin/gnuplot'):

Again, the comment about filenames.  You could just make outfname a file-like
object and skip the naming.

>                 self.outfname = outfname
>                 self.smalist = smalist
>                 self.gnuplot = gnuplot
>                 self.gprog = None
>                 self.gdata = None
>                 self.graphlen = graphlen
>                 self.graphheight = graphheight
>
>         def _writefiles(self):

Docstring.

>                 self.gprog = tempfile.mkstemp()
>                 self.gdata = tempfile.mkstemp()
>                 self.gprogfile = open(self.gprog[1], 'wb')
>                 self.gdatafile = open(self.gdata[1], 'wb')
>                 labelnum = int(self.graphlen / 110)
>                 labelstep = int(len(self.smalist) / labelnum)
>                 labels = []
>                 for i in range(0, len(self.smalist), labelstep):
>                         labels.append("\"%s %s\" %d" % (self.smalist[i][0], 
>self.smalist[i][1], i))
>                 labelstr = ", ".join(labels)
>
>                 self.gprogfile.write("""set terminal png size %d, %d
>set style line 1 lt 1 lw 2
>set style line 2 lt 2 lw 2
>set output "%s"
>set xtics(%s)
>set yrange [0:%f]
>set y2range [0:%f]
>plot "%s" using 1 with lines ls 1 title "orig" axes x1y1, "%s" using 2 with 
>lines ls 2 title "Moving Average" axes x1y2
>""" % (self.graphlen, self.graphheight, self.outfname, labelstr, 
>float(maxval), float(maxval),\

Superfluous backslash.

>                 self.gdata[1], self.gdata[1]) )
>                 self.gprogfile.close()
>                 for elem in self.smalist:
>                         self.gdatafile.write("%f, %f\n" % (elem[2], 
>elem[3]))
>                 self.gdatafile.close()
>
>         def plot(self):

Docstring.

>                 self._writefiles()
>                 gplot = subprocess.Popen(self.gnuplot + " " + 
>self.gprog[1],\
>                         shell=True, stdout=subprocess.PIPE, 
>stderr=subprocess.PIPE)

You should *avoid* using shell=True with subprocess.Popen.  There's no
reason to use it here, as far as I can tell.

>                 print "Plotting data (%s)..." % self.outfname
>                 so, se = gplot.communicate()
>                 if se:
>                         print "Gnuplot problem output:", se
>                 os.remove(self.gprog[1])
>                 os.remove(self.gdata[1])

Reporting the gnuplot failure with print is not the best thing.  Reporting
errors to users should be done at a separate layer.  This is your gnuplot
wrapper - its job is to wrap gnuplot, not to talk to the user.  If there's
a problem, make it available via some documented API (for example, the plot
method might raise an exception if there is a gnuplot problem).  Handle the
exception at a higher level in your application where you know it's correct
to print things for the user to read.

>
>
>if __name__ == "__main__":
>         try:
>                 fname = sys.argv[1]
>         except IndexError:
>                 print "Specify filename with data as first argument."
>                 sys.exit(1)
>         except Exception, e:
>                 print "Error:", e
>                 sys.exit(2)

What is the second exception handler doing?  There is no expected other
exception from the code being protected, so you should just get rid of
this handler.  If, through some crazy accident, another exception gets
raised, Python will take care of reporting it to the user, and it will
do so with far more information - information that will make your job of
debugging the problem vastly easier.

>         values = MovingAvg(fname)
>         values.extrfromfile()
>         values.calc_sma(smalen = 10)
>         (reslist, smalist, maxval) = values.return_results()
>         gp = GnuplotWrapper(smalist, maxval)
>         gp.plot()
>

The most significant thing missing from this code is unit tests.  Developing
automated tests for your code will help you learn a lot.

Jean-Paul



More information about the Python-list mailing list