Python Data Analysis Recommendations

Rob Gaddi rgaddi at highlandtechnology.invalid
Thu Dec 31 12:15:41 EST 2015


I'm looking for some advice on handling data collection/analysis in
Python.  I do a lot of big, time consuming experiments in which I run a
long data collection (a day or a weekend) in which I sweep a bunch of
variables, then come back offline and try to cut the data into something
that makes sense.

For example, my last data collection looked (neglecting all the actual
equipment control code in each loop) like:

for t in temperatures:
  for r in voltage_ranges:
    for v in test_voltages[r]:
      for c in channels:
        for n in range(100):
          record_data()

I've been using Sqlite (through peewee) as the data backend, setting up
a couple tables with a basically hierarchical relationship, and then
handling analysis with a rough cut of SQL queries against the
original data, Numpy/Scipy for further refinement, and Matplotlib
to actually do the visualization.  For example, one graph was "How does
the slope of straight line fit between measured and applied voltage vary
as a function of temperature on each channel?"

The whole process feels a bit grindy; like I keep having to do a lot of
ad-hoc stitching things together.  And I keep hearing about pandas,
PyTables, and HDF5.  Would that be making my life notably easier?  If
so, does anyone have any references on it that they've found
particularly useful?  The tutorials I've seen so far seem to not give
much detail on what the point of what they're doing is; it's all "how
you write the code" rather than "why you write the code".  Paying money
for books is acceptable; this is all on the company's time/dime.

Thanks,
Rob

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.



More information about the Python-list mailing list