[SciPy-dev] Guidelines for documenting parameter types

Mon Aug 18 10:56:37 EDT 2008

Neil Crighton wrote:
> (1) When we mention types in the parameters, we are mostly using the
> following abbreviations:
>
> integer : int
> float : float
> boolean : bool
> complex : complex
> list : list
> tuple : tuple
>
> i.e. the same as the python function names for each type.  It would be
> nice to say in the guidelines that these should be followed where
> possible.
>   
I agree with the addition of the default precision because NumPy 
supports multiple numerical precisions. At least the output text or 
notes section must indicate when NumPy changes the numerical precision:
 >>> a=np.array([1,2,3], dtype=np.int8)
 >>> type(np.mean(a))
<type 'numpy.float64'>
 >>> a=np.array([1,2,3], dtype=np.float32)
 >>> type(np.mean(a))
<type 'numpy.float64'>
 >>> a=np.array([1,2,3], dtype=np.float128)
 >>> type(np.mean(a))
<type 'numpy.float128'>

> (2) Often it's useful to state the type of an input or returned array.
> If we want to say the array returned by np.all is of type bool, what
> should we say? Possibilities used so far are
>
> int array
> array of int
> array of ints
>
> I prefer 'array of ints', because it is also suitable for tuples and
> lists ('tuple of ints', or 'list of dtypes'). 'int tuple' is just bad
> :) .
>   
As you indicate in the next point, most functions accept multiple input 
types  so this really applies to the output of a function. Depending on 
the function, the shape does not change (logical) or changes over a 
specified way (overall or over a given axis) which the user needs to 
know. Most functions I know of tend to maintain the dtype (such as sum) 
or make logical changes (mean may change input type to float64, logical 
functions change to boolean). So while I probably have not been 
consistent, I prefer using something like 'scalar' or 'array' of the 
(input) shape and dtype.

> (3) Many functions accept either sequences or scalars as input, and
> then return arrays if the input was a sequence, or an array scalar if
> the input was a scalar.  For example:
>
>   
>>>> a = np.sin(np.pi/2)
>>>> type(a)
>>>>         
> <type 'numpy.float64'>
>   
>>>> a = np.sin([np.pi/2,-np.pi/2])
>>>> type(a)
>>>>         
> <type 'numpy.ndarray'>
>
> There was some discussion about the best way to handle this:
>
> http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.arcsin/#discussion-sec
> http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.arctan/#discussion-sec
> http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.greater_equal/#discussion-sec
>
> Stefan proposed that for these functions we just refer to the input
> parameter type as array_like, and the return type as ndarray, since
> these are both described as including scalars in the glossary,
> http://sd-2116.dedibox.fr/pydocweb/doc/numpy.doc.reference.glossary/.
> I think this is a good rule. (Note that there is at least one proofed
> docstring that breaks this rule
> http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.greater/)
>   
I think that the input must be treated differently than the output.

Although I have used the 'array_like', it is not really correct because 
the input must be compatible with the NumPy array creation (ndarray 
compatible?). Dictionaries don't work or sparse matrix representations 
don't work (as expected but both are array-like).

It is not sufficient to say that the output is an ndarray because that 
does not describe the shape. It is essential to know if you get back the 
same shape as the input or a scalar, 0-d array, 1-d array etc.  Also if 
the dtype changes for example, logical functions returns boolean and 
mean returns float64 even if the input was integer. Consequently, I 
tried to be consistent by splitting the output description into scalar 
(probably 0-d array) and array.
> (4) Sometimes we need to specify more than one kind of type.  For
> example, the shape parameter of zeros can be either an int or a
> sequence of ints (but is not array_like, since it doesn't accepted
> nested sequences). How should we write this? Some possibilities are:
>
> int or sequence of ints
> {int, sequence of ints}
>
> I much prefer 'int or sequence of ints' as to me it's clearer and
> looks nicer. Also the curly brackets are used when a parameter can
> assume one of a set of fixed values (e.g. the kind keyword of argsort,
> which can be one of {'quicksort','mergesort','heapsort'}), so I think
> it is confusing to also use them in this case.
>   
I do not like using {}'s because  I start to read dictionary and the 
second usage is an element of a list or tuple. 

In the case of  shape, 'np.zeros(3)' is equivalent to 'np.zeros((3))' 
but different from 'np.zeros((3,3))'. For consistency, it should be 
clear that the shape is a tuple and behaves like Python tuples: 
type((1)) is an int so NumPy automatically treats an int argument as an 
1-d shape ie  as the tuple (int).

> (5) For keyword arguments, the default value is often None. In this
> case we've been omitting None from the parameter types. However,
> sometimes None is a valid input type but is not the default (e.g. axis
> keyword for argsort). In this case I think it's a good idea to include
> None as an explicit parameter.
>   
The Zen of Python (http://www.python.org/dev/peps/pep-0020/ or at the 
Python prompt type: import this):

"Explicit is better than implicit."

> I've posted to both the scipy-dev and numpy lists - I wasn't sure
> which best for this.
>
> Neil
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>
>   
Regards
Bruce