Python package statistics

Yaşar Arabacı yasar11732 at gmail.com
Fri Oct 18 17:07:35 EDT 2013


Hi Terry,

Thanks for pointing it out.matplotlib's hist function wasn't broken
after all :) I published non-parametric statistics here:
http://ysar.net/python/python-package-statistics-additions.html

2013/10/18 Terry Reedy <tjreedy at udel.edu>:
> On 10/18/2013 8:41 AM, Yaşar Arabacı wrote:
>>
>> Hi people,
>>
>> I collected some data on PyPI and published some statistics about
>> packages on PyPI. I think you might find it an interesting read:
>>
>> http://ysar.net/python/python-package-statistics.html
>
>
> "b2gpopulate (36MB)
> ...
> Total sizes on packages in PyPI amounted to 4.2 GB. Average package size is
> 161 KB and standard deviation is 1MB."
>
> For such highly skewed data, the mean and especially the standard deviation
> and confidence intervals are meaningless. The are 'parameteric' statistics,
> which is to say, were designed for bell-shaped distributions. (I will not
> say 'normal' == Guassian distributions because they are *not* normal for
> much raw data.)
>
>  A better summary is obtained from either 'non-parametric' statistics
> (median, inter-quartile range) or from 'normalizing' the data (if possible).
> For the latter, try taking the square root or log of the sizes and plot the
> distribution. If either works, take the mean and sd of the transformed
> values. Then report those and also the transformed back mean and mean+-sd.
>
> --
> Terry Jan Reedy
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
http://ysar.net/



More information about the Python-list mailing list