[SciPy-dev] An interesting exercise - reproduce R analysis using python.

Alex Holcombe alexh at psych.usyd.edu.au
Mon Jan 26 03:55:52 EST 2009


Alan Jackson <alan <at> ajackson.org> writes:

> 
> A quite good online book, "Using R for Data Analysis and Graphics" by
> John Maindonald has been updated and has all the data and exercises
> posted on the net. Relevance to this group? I think it might be quite
> fun and instructive to work through the 'R' exercises with
> numpy/matplotlib/scipy instead. I'm considering doing it, partly just
> to broaden my numpy/matplotlib skillset, but I thought I would post
> this here in case anyone else might find it an interesting challenge.
> 
> Exercises and datasets can be found from links here :
>
http://blog.revolution-computing.com/2009/01/using-r-for-data-analysis-and-graphics.html
> 

A first step for me in having the R functionality I need within python is to 
take the data from an experiment and plot the dependent variable as a function
of a subset of the independent variables in the experiment. For example plotting
vehicle fuel efficiency as a function of mpg and vehicle weight. The loadtxt
function easily imports my data files in a structure called a recarray, similar
to a data.frame in R, a lot like a flat spreadsheet with a name for each column. 
I would like to collapse the other variables to determine the mean fuel
efficiency for every combination of mpg and vehicle weight. If done in Excel, I
think this involves a “PivotTable". For python, I wrote a function where I pass
a recarray, the dependent variable name, and the names of the variables
(datafile columns) that I want to collapse by, and it passes back
multi-dimensional arrays providing the mean, standard deviation, and number of
data points for every combination of the variables.
collapseBy(data,DV,*factors)
I am new to SciPy and don't know how to make my code follow its style, but I've
posted the code here http://openwetware.org/images/7/7b/CollapseBy.txt
blogged on the subject here
http://alexholcombe.wordpress.com/2009/01/26/summarizing-data-by-combinations-of-variables-with-python/
I would hope that something like this could be put into SciPy?

Alex Holcombe, http://www.psych.usyd.edu.au/staff/alexh/




More information about the SciPy-Dev mailing list