[SciPy-Dev] docstring standard: parameter shape description

Mon Jan 28 16:47:26 EST 2013

On Mon, Jan 28, 2013 at 1:21 PM, Joe Harrington <jh at physics.ucf.edu> wrote:
> On Sun, Jan 27, 2013 at 2:51 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>> Hi,
>>
>> When merging the doc wiki edits there were a large number of changes to the
>> shape description of parameters/returns. This is not yet described in the
>> docstring standard
>> (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), and
>> currently is done in various ways:
>>
>> param1 : ndarray, shape (N,)
>
> I think it should be consistent between all cases, start with the class
> and then the shape, and solve the general problem.
>
> Initially, I agreed with Josef about being terse, but it reads hard that
> way and if you're a newbie you might wonder what the numbers in parens
> are.  The word "shape" does not add an extra line, and the comma makes
> sense as an appositive in English.

+1, the word 'shape' is a pretty critical clue the first time you see this.

> So, I prefer:
>
> param1 : ndarray, shape XXXXX
>
> For XXXXX, we need to specify:
>
> ranges of allowed numbers of dimensions
> ranges of allowed sizes within each dimension
> low- or high-side unconstrained sizes in either case
>
> We should accept the output of .shape, and define some range
> conventions.  Of course, there will be pathological cases, particularly
> in specialist packages that adopt the numpy doc standard, where nothing
> but text will adequately describe the allowed dimensions ("If there are
> three dimensions, then the second dimension must...").  A "(see text)"
> should be allowed after the shape spec.
>
> So, this is my counterproposal for inclusion in the standard:
>
> -------------------------------------------------------------------------------
> param1 : ndarray, shape <shapespec> [(see text)]
> as in
> param1 : ndarray, shape (2, 2+, dim(any), 4-, 4-6, any) (see text)
>
> in <shapespec>:
>   the spec reads from the slowest-varying to the fastest-varying dimension
>   a number means exactly that number of items on that axis
>   a number followed by a "+" ("-") means that number or more (fewer) items
>   a-b means between a and b items, INCLUSIVE
>   "any" means any number of items on that axis
>   dim(dimspec) means the conventions above apply for dimensions instead of items
>
> The example would mean an array with dimensions, from slowest to
> fastest-varying, of size:
> 2
> 2 or more
> (0 or more axes can be inserted here)
> 0 to 4
> 4 to 6
> any size, including absent (use 1+ to require a dimension)

"any size" should mean 0+. "absent" is not a size. If a function does
accept an optional final dimension, can we write that like 'shape (N,
D) or shape (N,)'?

For inserting axes, "..." is clearer than the rather opaque
"any(dim)", and matches existing Python convention.

Generally, though, for input parameters it's usually best to specify
the size as a variable rather than a numeric range so it can be
referred back to later, right? And for output parameters there's no
need to specify ranges, since the shape should be determined by the
input?  'in1 : ndarray, shape (N, M), in2 : ndarray, shape (M, K), out
: ndarray, shape (N, K)'. The spec in this complexity seems to be in
peril of overengineering. Do we have examples of when these more
elaborate specifiers would be useful?

-n