[SciPy-dev] GSoC Project Proposal: Datasource and Jonathan Taylor's statistical models

Skipper Seabold jsseabold at gmail.com
Fri Mar 27 13:43:54 EDT 2009


Hello all,

I am a first year PhD student in Economics at American University, and
I would very much like to participate in the GSoC with the NumPy/SciPy
community.  I am looking for some feedback and discussion before I
submit a proposal.

Judging by the ideas page and the discussion in this thread (
http://mail.scipy.org/pipermail/scipy-dev/2009-February/011373.html )
I think the following project proposal would be useful to the
community.

My proposal would have two parts, the first would be to improve
datasource and integrate it into the numpy/scipy io.  I see this as a
way to get my feet wet working on a project.  I do not imagine that it
would take more than 2-3 weeks work on my end.

The second part would be to get Jonathan Taylor's statistical models
from the NiPy project into scipy.stats.  I think that I would be a
good candidate for this work, as I am currently studying statistics
and learning the ins and outs of NumPy/SciPy, so I don't mind doing
some of the less appealing work as this is also a great learning
opportunity.  Also I see this as a great way to get involved in the
SciPy community in an area that currently needs some attention.  I am
a student, so I would be able to help maintain the code, bug fix, and
address other areas of the statistical capabilities that need
attention.

Below is a general outline of my proposal with some areas that I have
identified as needing work.  I am eager to discuss some aspects of the
projects with those that are interested and to work on the appropriate
milestones.

1) Improve datasource and integrate it into all the numpy/scipy io

Bug Fixes
    Catch and handle malformed URLs

Refactoring

Enhancements
    Improve findfile method
    Improve cache method
    Add zip archive, tar file handling capabilities
    Improve networking interface to handle timeouts and proxies if
there is sufficient interest

Documentation
    Document changes

Tests
    Implement test coverage for new changes

Copy/Move to scipy.io

2) Integrate Jonathan Taylor's statistical models into scipy.stats

These models are currently in the NiPy project
Merge relevant branches (branch trunk-josef models has the most recent
changes, I believe)

I will focus mostly on bringing over the linear models, which I
believe would include at the least:
bspline.py, contrast.py, gam.py, glm.py, model.py, regression.py, utils.py

Bug Fixes
    Bug hunting
    Improve existing test coverage

Refactoring
    Eliminate existing and created duplicate functionality
    Make sure parameters are consistent, etc.

Enhancements

Documentation
    Document changes
    Make any necessary changes to stats/info.py

Testing
    Make sure test coverage is adequate



More information about the SciPy-Dev mailing list