Request for feedback on API design

Tim Chase python.list at tim.thechases.com
Thu Dec 9 19:48:10 EST 2010


On 12/09/2010 05:44 PM, Steven D'Aprano wrote:
> (1) Multivariate statistics such as covariance have two obvious APIs:
>
>      A pass the X and Y values as two separate iterable arguments, e.g.:
>        cov([1, 2, 3], [4, 5, 6])
>
>      B pass the X and Y values as a single iterable of tuples, e.g.:
>        cov([(1, 4), (2, 5), (3, 6)]
>
> I currently support both APIs. Do people prefer one, or the other, or
> both? If there is a clear preference for one over the other, I may drop
> support for the other.

I'm partial to the "B" form (iterable of 2-tuples) -- it 
indicates that the two data-sets (x_n and y_n) should be of the 
same length and paired.  The "A" form leaves this less obvious 
that len(param1) should equal len(param2).

I haven't poked at your code sufficiently to determine whether 
all the functions within can handle streamed data, or whether 
they keep the entire dataset internally, but handing off an 
iterable-of-pairs tends to be a little more straight-forward:

   cov(humongous_dataset_iter)

or

   cov(izip(humongous_dataset_iter1, humongous_dataset_iter2))

The "A" form makes doing this a little less obvious than the "B" 
form.

> (2) Statistics text books often give formulae in terms of sums and
> differences such as
>
> Sxx = n*Σ(x**2) - (Σx)**2
>
> There are quite a few of these: I count at least six common ones,

When you take this count, is it across multiple text-books, or 
are they common in just a small sampling of texts?  (I confess 
it's been a decade and a half since I last suffered a stats class)

> all closely related and confusing named:
>
> Sxx, Syy, Sxy, SSx, SSy, SPxy
>
> (the x and y should all be subscript).
>
> Are they useful, or would they just add unnecessary complexity?

I think it depends on your audience:  amateur statisticians or 
pros?  I suspect that pros wouldn't blink at the distinctions 
while weekenders like myself would get a little bleary-eyed 
without at least a module docstring to clearly spell out the 
distinctions and the forumlae used for determining them.

Just my from-the-hip thoughts for whatever little they may be worth.

-tkc






More information about the Python-list mailing list