[Tutor] Evaluate my script?

Tue Oct 24 09:54:42 CEST 2006

Adam Gomaa <python-tutor <at> adamgomaa.com> writes:
> Hello, -tutor. I wrote this script in chemistry class, which gets the
> user's input for various parts of the PV=nRT ideal gas law, then
> calculates the missing variable, which you indicate by putting in an 'x'.
>
> I've tried to follow 'best practices' per se, and I'm looking for
> feedback on if I've done anything that's a Python or programming no-no.

Hi Adam,

The script looks ok. You make good use of functions, have documented your
intents and use decent variable names. Of course there's always room for
improvement :).

The string module defines the constants digits and lowercase, so you don't have
to define them yourself. I would also choose to initialize the other constant,
R, simply at the top of the module, instead of inside an init function. This is
important because if I'd want to use your module, I'd expect not to have to call
init() to make sure it exists. In your current implementation init is called
automatically, but that's a violation of Python idiom that I'll address below.

I would recommend using "from __future__ import division" at the top of your
script, so division behaves properly (3/2 == 1.5 instead of 1). This way you
don't have to make sure you write 3.0/2 everywhere you need to divide something.
There's something to be said for your approach as well though, since someone
reading your code would know what to expect when seeing 3.0/2, without caring
whether "from __future__ import division" is used. But I consider the integer
division in Python improper behavior that is fixed by this statement.

A quick glance at the parsingLogic function shows it's much larger and more
complex than its siblings. This is a sign that it's a good candidate for some
improvement work. 

First, the name: functions should describe an action and therefore start with a
verb - parseInput would be better. This new name also raises certain
expectations: I put in the user input (strings) and I get out some numbers. At
this point, the function does not do this, which leads to the ugly float() calls
all over the place.

If you look inside the function, two things are striking: repeated behavior
patterns and the presence of 4 indentation levels. Let's start with the repeats:
the splitting of values into a number and an unit. Have a look at what happens
for p, v and t: all of them have the same logic. 
Can you refactor that (in other words, extract the common behavior into a
separate function) and thereby simplify the code? This logic should A) split the
input into a numerical value and an unit (may be unspecified), B) return the
value as a number and the unit as a string.

Then the excessive indentation levels: the temp_list is the cause of this. Lists
are great if you want to perform the same processing on all of their elements.
In this case, the list just takes a bunch of unrelated things together and then
you treat the individual things in different ways. It's like going to the market
to buy an apple and doing this by taking a bag, filling it with one piece of
every fruit type on offer, then taking every fruit out of the bag, examining
whether it is an apple and buy the one where this test returns True. 
The sanity check at the top could easily be written as follows (slightly
shorter, one variable less and just as clear):

    if 'x'==p:
        unknown_variable='p'
    elif 'x'==v:
        unknown_variable='v'
    elif 'x'==t:
        unknown_variable='t'
    elif 'x' ==m:
        unknown_variable='m'
    else: # user didn't specify any unknowns
        print "##########   No variable found!   ############"
        return False # Causes an error, hence the noticable string

The "for each in temp_list" loop will disappear completely and be replaced by
something like (with splitInput being the refactored function I mentioned 
above):

    pvalue, punit = splitInput(p)
    if punit in ['a', 'atm']: # easier than "if punit=='a' or punit=='atm'
        pvalue = pvalue * 101.325
    elif punit in ['torr', 'tor', 'mmhg']:
        pvalue = pvalue * 101.325 / 760

    # similar code for v and t

If splitInput is implemented correctly, you don't even have to check whether
p=='x': splitInput just returns pvalue==0 and punit=='x' and the code would work
just fine! This has removed two indentation levels. Of course you'd have to
return pvalue instead of p at the end of parseInput.

The calculations function could use a better name that indicates an action, like
calculateUnknown. It's a pity that you've defined calcPress and its siblings
inside calculateUnknown instead of outside: they're functions that would be
reusable in other chemistry/physics-related applications if they were contained
as normal functions inside the module. The float() stuff will disappear if you
modify parseInput as described above.

The output() function seems a bit pointless in the current implementation - I'd
just use "print" directly in main.

Now onto the Python idiom I mentioned at the beginning: for reuse it's better to
not always run the main() function. If I import your module because I want to
reuse calcVol, I'd end up running the complete code. In order to avoid this,
write at the bottom of the module:

    if __name__=='__main__':
        main()

This way main is only called if the script is running on its own, not by being
imported in some other application.

Some generic remarks:
1. I can think of an potentially useful use case for specifying all 4 input
values with no unknowns: check whether the combination is valid while doing
homework, without getting the correct answer. This is a bit tricky because you
can't assume pV is *exactly* equal to nRT when floats are used, so you'd have to
build in a bit of tolerance.

2. Functions can be assigned as variables, like this:

  >>> def myfunc(a, b):
  ...     print a + b
  ...
  >>> f = myfunc # note the absence of parentheses - myfunc is not called!
  >>> f(3, 4) # calls myfunc
  7

To make things even better, you can convert lists to function arguments like 
this:

  >>> args = [3,4]
  >>> f(*args) # the * 'unpacks' the list
  7

Using these two techniques makes it possible to get rid of "unknown_variable":
parseInput could return a function and a list of arguments to be supplied to it.
There is an argument to be made both in favor and against this approach, but
it's interesting anyway.

3. Mirror user input: the highly tolerant unit parsing is potentially dangerous.
Say I accidentally specify p=2atn. The program will calculate with 2 kPa, but I
wouldn't know that. If instead it displayed its interpretation of my input by
writing "p = 2 kPa" to the screen, I'd have a better chance of noticing.
Alternatively, you could provide a warning like "unknown pressure unit: atn.
Assuming kPa.".

That's a lot of comments for such a small program, so keep in mind you don't
have to do everything I (or others :)) write. Some of the advice is generic in
nature: e.g. you might not care about reuse in this particular case, but it's
still good to know what you should pay attention to in the cases where you DO 
care.

Yours,

Andrei