basic statistics in python

Matt Austin threehounds at worldnet.att.net
Sun Mar 17 15:14:46 EST 2002


Quartiles are neither superior or inferior to quantiles, it's just 
another term.  When people refer to quartiles they mean the 25th and 
75th quantile.  Another term you will see is quintiles which refer to 
the 20th, 40th, 60th, and 80th quantiles. This can be generalized to 
deciles, etc.

The nice thing about R is that most of the functions are written in the 
language and can be viewed to see how they are calculated by simply 
typing the name of the function which uses the default show method for 
functions.  If the functions are internal, then you can view the source 
code to verify the calculations.

--Matt



Siegfried Gonzi wrote:

> Tim Churches wrote:
> 
> 
>>> delivers:
>>> 
>>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>   0.230   1.226   7.300  18.960  31.680  78.900
>>> 
>>> Everything is correct, except the 1st quantile and 3rd quantile.
>> 
>> You mean 1st quartile and 3rd quartile, not quantile. And the values
>> calculated by R are not wrong, just different (see below).
> 
> 
> The book "Statistical Methods in the Atmospheric Sciences", by D.S.
> Wilks, does not really make a difference between "quantiles" and
> "quartiles". According to the book I got the impression that quartiles
> is inferior to quantiles (e.g. page 24: "Example 3.1. Computation of
> Common Quantiles".
> 
> But you are right that I should be more precise in order to avoid
> confusion.
> 
> 
> 
>> There are a number of methods for calculating quantiles. In R, the
>> summary() function calls the quantile() function to calculate the 1st
>> and 3rd quartiles and the median. The quantile() function uses linear
>> interpolation to calculate the sample quantile for the probabilities of
>> 0.25 and 0.75, whereas XLispStat is just taking the arithmetic mean of
>> the 2nd and 3rd, and 6th and 7th values respectively (using zero-based
>> indexing/counting, since this is the Python list).#
> 
> 
> My first guess was also that R just calculates the quantiles in a
> different fashion; but I could not find any hints in the documentation.
> According to the beforementioned book (page 23):
> 
> "Almost as commonly used as the median are the quartiles, q0.25 and
> q0.75. Usually these are called the lower and upper quartiles,
> respectively. They are located halfway between the median, q0.5, and the
> extremes, x(1) and x(n). In typically colorful terminology, Tukey (1977)
> calls q0.25 and q0.75 the 'hinges', imagining that the data set has been
> folded first at the median, and the quartiles."
> 
> I simply thought (and note the word "halfway" in the citation) then
> XLispStat is/was correct.
> 
> 
>> The methods used by R are fully described in the R manual (see
>> help(quantile)), but a commonsense explanation of the R approach is as
>> follows (again using zero-based indexing/counting). 
> 
> 
> Maybe I did look too superficialy after the method of calculation.
> 
> 
> Regards and especially thank you for your insight,
> S. Gonzi


 




More information about the Python-list mailing list