[SciPy-User] [ANN] scikit.statsmodels 0.2.0 release

Gael Varoquaux gael.varoquaux at normalesup.org
Fri Feb 19 12:55:40 EST 2010


On Fri, Feb 19, 2010 at 12:45:04PM -0500, Skipper Seabold wrote:
> > Also, there will be differences APIs, as far as I understand the
> > statsmodel API. For instance, I believe that constructors of models
> > should work without passing it the data (the data could be optional). The
> > reason being that on-line estimators shouldn't be passed in
> > initiallisation data. As a consequence, maybe the 'fit' method should
> > take the data... All this is quite open to me, and I don't want to draw
> > any premature conclusion.


> Just a quick comment (disclaimer: all my own thoughts and
> misunderstandings...feel free to correct me).  Historically, the
> statsmodels package accepted a design during the model instantiation
> then you used your dependent variable during the fit method.  To my
> mind, though this didn't seem to make much sense for how I think of a
> model (probably somewhat discipline specific?).  For the estimators
> that we have we are usually fitting a parametric model in order to
> test a given theory about the data generating process.  The model
> doesn't make much sense to me without the data (my data is not
> real-time and I am not data mining). 

Suppose you implement recursive estimation, say Kalman filtering? There
are usecases for that, and we want to solve them.

Sometimes your data doesn't fit in memory. If you have a forward
selection regression model on huge data, say genomics data that is never
remotely going to fit in your memory and that you are fishing out of a
database, the API is also going to break down.

Also, being able to give initial guesses to the estimator, to do
warm-restart of a convex optimisation for instance, might change
significantly the computational cost of eg a cross validation.

On the other hand, my experience is that trying to solve all the possible
usecases beforehand without working code and examples just leads to
developers staring at a white board. So I'd rather move forward, and
think about API based on examples.

I just wanted to warn that we are probably not going to follow exactly
the existing APIs, and that there were reasons for that. I am not trying
to bash existing APIs, this is a pointless activity, IMHO.

Gaël



More information about the SciPy-User mailing list