[SciPy-User] [ANN] scikit.statsmodels 0.2.0 release

Skipper Seabold jsseabold at gmail.com
Fri Feb 19 13:04:50 EST 2010


On Fri, Feb 19, 2010 at 12:55 PM, Gael Varoquaux
<gael.varoquaux at normalesup.org> wrote:
> On Fri, Feb 19, 2010 at 12:45:04PM -0500, Skipper Seabold wrote:
>> > Also, there will be differences APIs, as far as I understand the
>> > statsmodel API. For instance, I believe that constructors of models
>> > should work without passing it the data (the data could be optional). The
>> > reason being that on-line estimators shouldn't be passed in
>> > initiallisation data. As a consequence, maybe the 'fit' method should
>> > take the data... All this is quite open to me, and I don't want to draw
>> > any premature conclusion.
>
>
>> Just a quick comment (disclaimer: all my own thoughts and
>> misunderstandings...feel free to correct me).  Historically, the
>> statsmodels package accepted a design during the model instantiation
>> then you used your dependent variable during the fit method.  To my
>> mind, though this didn't seem to make much sense for how I think of a
>> model (probably somewhat discipline specific?).  For the estimators
>> that we have we are usually fitting a parametric model in order to
>> test a given theory about the data generating process.  The model
>> doesn't make much sense to me without the data (my data is not
>> real-time and I am not data mining).
>
> Suppose you implement recursive estimation, say Kalman filtering? There
> are usecases for that, and we want to solve them.
>
> Sometimes your data doesn't fit in memory. If you have a forward
> selection regression model on huge data, say genomics data that is never
> remotely going to fit in your memory and that you are fishing out of a
> database, the API is also going to break down.
>
> Also, being able to give initial guesses to the estimator, to do
> warm-restart of a convex optimisation for instance, might change
> significantly the computational cost of eg a cross validation.
>
> On the other hand, my experience is that trying to solve all the possible
> usecases beforehand without working code and examples just leads to
> developers staring at a white board. So I'd rather move forward, and
> think about API based on examples.
>

All valid points, and I agree.  In the past, I've found it very hard
to code for use cases that I am not aware of ;)

> I just wanted to warn that we are probably not going to follow exactly
> the existing APIs, and that there were reasons for that. I am not trying
> to bash existing APIs, this is a pointless activity, IMHO.
>
> Gaël
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list