Request for feedback on API design

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Dec 10 02:51:54 EST 2010


On Thu, 09 Dec 2010 18:48:10 -0600, Tim Chase wrote:

> On 12/09/2010 05:44 PM, Steven D'Aprano wrote:
>> (1) Multivariate statistics such as covariance have two obvious APIs:
>>
>>      A pass the X and Y values as two separate iterable arguments,
>>      e.g.:
>>        cov([1, 2, 3], [4, 5, 6])
>>
>>      B pass the X and Y values as a single iterable of tuples, e.g.:
>>        cov([(1, 4), (2, 5), (3, 6)]
>>
>> I currently support both APIs. Do people prefer one, or the other, or
>> both? If there is a clear preference for one over the other, I may drop
>> support for the other.
> 
> I'm partial to the "B" form (iterable of 2-tuples) -- it indicates that
> the two data-sets (x_n and y_n) should be of the same length and paired.
>  The "A" form leaves this less obvious that len(param1) should equal
> len(param2).


Thanks for the comments Tim. To answer your questions:


> I haven't poked at your code sufficiently to determine whether all the
> functions within can handle streamed data, or whether they keep the
> entire dataset internally,

Where possible, the functions don't keep the entire dataset internally. 
Some functions have to (e.g. order statistics need to see the entire data 
sequence at once), but the rest are capable of dealing with streamed data.

Also, there are a few functions such as standard deviation that have a 
single-pass algorithm, and a more accurate multiple-pass algorithm.


>> (2) Statistics text books often give formulae in terms of sums and
>> differences such as
>>
>> Sxx = n*Σ(x**2) - (Σx)**2
>>
>> There are quite a few of these: I count at least six common ones,
> 
> When you take this count, is it across multiple text-books, or are they
> common in just a small sampling of texts?  (I confess it's been a decade
> and a half since I last suffered a stats class)

I admit that I haven't done an exhaustive search of the literature, but 
it does seen quite common to extract common expressions from various 
stats formulae and give them names. The only use-case I can imagine for 
them is checking hand-calculations or doing schoolwork.


-- 
Steven



More information about the Python-list mailing list