[SciPy-User] Some interactive Python tutorials on basic stats, possibly useful for teaching

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Feb 14 12:45:13 EST 2011


On Mon, Feb 14, 2011 at 11:03 AM,  <josef.pktd at gmail.com> wrote:
> On Mon, Feb 14, 2011 at 10:37 AM, Fernando Perez <fperez.net at gmail.com> wrote:
>> On Mon, Feb 14, 2011 at 6:54 AM, Rajeev Raizada
>> <rajeev.raizada at dartmouth.edu> wrote:
>>>
>>> Those of you whose duties include teaching basic stats
>>> might be interested in these interactive tutorial files,
>>> designed to illustrate basic concepts.
>>
>> Wonderful codes!  Many thanks for sharing these, they are a fantastic
>> illustration...
>>
>> <shameless plug>
>> For those using the ipython qt console from git's master, you can load
>> Rajeevs' codes straight in via:
>>
>> %loadpy http://www.dartmouth.edu/~raj/Python/interactive_mean_std_normal_distribution.py
>>
>> %loadpy http://www.dartmouth.edu/~raj/Python/interactive_two_sample_t_test.py
>>
>> %loadpy http://www.dartmouth.edu/~raj/Python/interactive_correlation_plot.py
>>
>> </plug>
>
> And additionally thanks for the link to
> http://en.wikipedia.org/wiki/Anscombe%27s_quartet
>
> I didn't know about those.
>
> (and, to follow in Fernando's footsteps)
>
> <plug without links>
>
> use higher order moments
> use general measures of dependence
> use non-linearity tests
> use a robust estimator
> and use partial correlation in the multivariate case

<plug continued>

since it works fine on my computer, and it's fun

I added and changed the following lines to plot_the_correlation() in
interactive _correlation_plot.py

    import numpy as np
    from scikits.statsmodels.api import RLM, add_constant
    resrlmbeta = RLM(y_coords, add_constant(x_coords)).fit().params
    fitted_robust =
np.dot(add_constant(axis_range*scipy.array([-1,1])),
resrlmbeta).ravel()

    #### Plot the best-fit line in red
    handle_of_regression_line_plot = pylab.plot(axis_range*scipy.array([-1,1]),
                    np.column_stack((y_intercept +
slope*axis_range*scipy.array([-1,1]),
                               fitted_robust)),'r-')

I now have two lines, one OLS the other one robust. Setting first some
points roughly in a straight line and then adding a few outliers shows
the difference between standard OLS and robust estimators.

It's a very quick hack:
RLM doesn't have a predict method (a bug I just found), although
fittedvalues exist.
I don't know how to add a second plot() with fittedvalues to the same
handle without running to the matplotlib manual, nor how to make
separate colors for the lines if they are in the same plot().

This could be very nice to illustrate the different weighting options
in RLM, the default doesn't look very robust if there are many
outliers.

> </plug>

Thanks,

Josef
>
> Josef
>
>>
>> Cheers,
>>
>> f
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>



More information about the SciPy-User mailing list