[Edu-sig] Teaching Python to Technologists

kirby urner kirby.urner at gmail.com
Fri Jul 7 01:55:00 CEST 2006


I've pretty much finished up my small contract with Bernie's
geokem.com, which involved two scopions in a jar:  a Python scorpion
versus a Java scorpion (I was the Python scorpion).  Bernie had us
fight it out, in a kind of double blind experiment, with himself as
referee (only he could see who was winning).

I dwelt on the dictionary (as a built-in data structure) and the csv
reader (native to the Standard Library csv module) as especially
relevant to his work.  Numeric indexing of large tables introduce
extraneous X,Y coordinates where the original spreadsheet had more
mnemonically meaningful axes:  sample IDs (rows) vs chemical names
(columns).

A dictionary of dictionaries will take you to any cell on the
spreadsheet -- e.g. samples['HAWAII0626']['FeO2'] -- and this is
*especially* useful when the community has no agreed upon order for
the columns (the rows are by definition unsorted as well).

If you're going to analyze something as quicksandy as a csv file with
ever-shifting column headers, better to use names, not positionality,
to grab values.  A numeric index approach is too risky -- you might
actually get a working GIGO program, and not know the chemicals you
wanted were now ordered differently.  My solution guards against that
sorry outcome.

I wrote it all up in a 10 page PDF, plus provided working source code,
a lot of it built around Python geokem had already internalized
(Bernie used to write everything in Pascal).

What I've found interesting about teaching Python to technology
professionals is a lot depends on imparting our somewhat unfamiliar
jargon.  For example, the first row of the csv file is different from
all the others, in that it contains headers (chemical names).  Then
these particular files have footers as well, separated from the data
block by blank lines.

So rather than using a for loop, I used a while True, with a .next()
method.  It's that .next() that's confusing.  What does it mean?
Well, the csv.csv_reader returns an iterable.  So I use .next() the
first time to parse headers, then loop inside a while loop until a
blank line is encountered, building a dictionary as I go.  Even
regular open file objects have a next method (not to be used in
combination with readline).  I ended up explaining this by means of
StringIO (which simulates file objects using strings).

As to whether Python or Java won this particular bout, I think New
Zealand is a little behind the times (still teaching C++ as a first
language in CS).  However, my hash table approach, getting away from
integer indexing, may at least inform the Java-based solution, as it'd
be no trouble to use the same approach in that language.

Anyway, I think Bernie is sold on the value of Python.  It's more just
rumors (about Python being "undocumented" for example) which slow its
acceptance in a knowledge domain (geochemistry) that could really use
some computing savvy.

Per Bernie, most geochemists use their PCs for email and word
processing and that's about it.  The ability to program is a lost art
across the board, in many sciences as well as the humanities.

Kirby


More information about the Edu-sig mailing list