Questions about mathematical and statistical functionality in Python

Tim Churches tchur at optushome.com.au
Thu Jun 14 17:54:09 EDT 2007


Michael Hoffman wrote:
> Talbot Katz wrote:
> 
>> I hope you'll indulge an ignorant outsider.  I work at a financial 
>> software firm, and the tool I currently use for my research is R, a 
>> software environment for statistical computing and graphics.  R is 
>> designed with matrix manipulation in mind, and it's very easy to do 
>> regression and time series modeling, and to plot the results and test 
>> hypotheses.  The kinds of functionality we rely on the most are standard 
>> and robust versions of regression and principal component / factor 
>> analysis, bayesian methods such as Gibbs sampling and shrinkage, and 
>> optimization by linear, quadratic, newtonian / nonlinear, and genetic 
>> programming; frequently used graphics include QQ plots and histograms.  
>> In R, these procedures are all available as functions (some of them are 
>> in auxiliary libraries that don't come with the standard distribution, 
>> but are easily downloaded from a central repository).
> 
> I use both R and Python for my work. I think R is probably better for 
> most of the stuff you are mentioning. I do any sort of heavy 
> lifting--database queries/tabulation/aggregation in Python and load the 
> resulting data frames into R for analysis and graphics.

I would second that. It is not either/or. Use Python, including Numpy
and matplotlib and packages from SciPy, for some things, and R for
others. And you can even embed R in Python using RPy - see
http://rpy.sourceforge.net/

We use the combination of Python, Numpy (actually, the older Numeric
Python package, but soon to be converted to Numpy), RPy and R in our
NetEpi Analysis project - exploratory epidemiological analysis of large
data sets - see http://sourceforge.net/projects/netepi - and it is a
good combination - Python for the Web interface, data manipulation and
data heavy-lifting, and for some of the more elementary statistics, and
R for more involved statistical analysis and graphics (with teh option
of using matplotlib or other Python-based graphics packages for some
tasks if we wish). The main thing to remember, though, is that indexing
is zero-based in Python and 1-based in R...

Tim C



More information about the Python-list mailing list