Questions about mathematical and statistical functionality in Python

Josh Gilbert jgilbert.r at gmail.com
Thu Jun 14 21:08:08 EDT 2007


On Thursday 14 June 2007 5:54 pm, Tim Churches wrote:
> Michael Hoffman wrote:
> > Talbot Katz wrote:
> >> I hope you'll indulge an ignorant outsider.  I work at a financial
> >> software firm, and the tool I currently use for my research is R, a
> >> software environment for statistical computing and graphics.  R is
> >> designed with matrix manipulation in mind, and it's very easy to do
> >> regression and time series modeling, and to plot the results and test
> >> hypotheses.  The kinds of functionality we rely on the most are standard
> >> and robust versions of regression and principal component / factor
> >> analysis, bayesian methods such as Gibbs sampling and shrinkage, and
> >> optimization by linear, quadratic, newtonian / nonlinear, and genetic
> >> programming; frequently used graphics include QQ plots and histograms.
> >> In R, these procedures are all available as functions (some of them are
> >> in auxiliary libraries that don't come with the standard distribution,
> >> but are easily downloaded from a central repository).
> >
> > I use both R and Python for my work. I think R is probably better for
> > most of the stuff you are mentioning. I do any sort of heavy
> > lifting--database queries/tabulation/aggregation in Python and load the
> > resulting data frames into R for analysis and graphics.
>
> I would second that. It is not either/or. Use Python, including Numpy
> and matplotlib and packages from SciPy, for some things, and R for
> others. And you can even embed R in Python using RPy - see
> http://rpy.sourceforge.net/
>
> We use the combination of Python, Numpy (actually, the older Numeric
> Python package, but soon to be converted to Numpy), RPy and R in our
> NetEpi Analysis project - exploratory epidemiological analysis of large
> data sets - see http://sourceforge.net/projects/netepi - and it is a
> good combination - Python for the Web interface, data manipulation and
> data heavy-lifting, and for some of the more elementary statistics, and
> R for more involved statistical analysis and graphics (with teh option
> of using matplotlib or other Python-based graphics packages for some
> tasks if we wish). The main thing to remember, though, is that indexing
> is zero-based in Python and 1-based in R...
>
> Tim C

Thirded. I use R, Python, Matlab along with other languages (I hate pipeline 
pilot) in my work and from what I've seen nothing can compare with R when it 
comes to stats. I love R, from its brilliant CRAN system (PyPI needs serious 
work to be considered in the same class as CPAN et al) to its delicious Emacs 
integration. 

I just wish there was a way to distribute R packages without requiring the 
user to separately install R. 

In a similar vein, I wish there was a reasonable Free Software equivalent to 
Spotfire. The closest I've found (and they're nowhere near as good) are 
Orange (http://www.ailab.si/orange) and WEKA 
(http://www.cs.waikato.ac.nz/ml/weka/). Orange is written in Python, but its 
tied to QT 2.x as the 3.x series was not available on Windows under the GPL. 


Josh Gilbert



More information about the Python-list mailing list