[Numpy-discussion] array, asarray as contiguous and friends

Tim Hochberg tim.hochberg at cox.net
Fri Mar 24 11:36:06 EST 2006


Colin J. Williams wrote:

> Tim Hochberg wrote:
>
>> Colin J. Williams wrote:
>>
>>> Tim Hochberg wrote:
>>>
>>>> Colin J. Williams wrote:
>>>>
>>>>> Tim Hochberg wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> I was just looking at the interface for array and asarray to see 
>>>>>> what other stuff should go in the interface of the hypothetical 
>>>>>> ascontiguous.  There's 'dtype', which I knew about, and 
>>>>>> 'fortran', which I didn't, but which makes sense. However, 
>>>>>> there's also 'ndmin'. First off, it's not described in docstring 
>>>>>> for asarray, but I was able to find it in the docstring for array 
>>>>>> without a problem. Second, is it really necessary? It seems to be 
>>>>>> useful in an awfully narrow set of circumstances, particularly 
>>>>>> since when you are padding axes not everyone wants to pad to the 
>>>>>> left.
>>>>>>
>>>>>> It would seem to be more useful to ditch the ndmin and have some 
>>>>>> sort of paddims function that was more full featured (padding to 
>>>>>> either the left or the right at a minimum). I'm not entirely sure 
>>>>>> what the best interface to such a beast would look like, but a 
>>>>>> simple tactic would be to just provide leftpaddims and rightpaddims.
>>>>>>
>>>>>> If it's not allready clear by now (;), I prefer several narrow 
>>>>>> interfaces to one broad one.
>>>>>>
>>>>>> -tim
>>>>>>   
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> What does ascontiguous do that copy doesn't?  
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> What it doesn't do is always copy the argument. Just like asarray, 
>>>> it returns it unchanged if it's contiguous.
>>>
>>>
>>>
>>>
>>> Fair enough.  I guess that, for some array a, "b= ascontiguous(a)"  
>>> saves a few keystrokes as compared
>>> with "b= a if a.flags.contiguous else a.copy()".  The intent of the 
>>> latter is clearer and probably burns fewer cycles.
>>
>>
>>
>> First, the second expression is not equivalent to ascontiguous. 
>> Second, I disagree strongly that it's clearer. Third, if cycles are 
>> actually a concern, ascontiguous if implemented in C would certainly 
>> be faster. Fourth, this can't be written in any released version of 
>> Python.
>>
> Thanks, this is illuminating.  Fourth is true, but the first release 
> of Python 2.5 is expected in eight dats.
>
> We see, in the Help:
> Help on built-in function array in module numpy.core.multiarray:
>
> array(...)
>    array(object, dtype=None, copy=1, fortran=0, subok=0,ndmin=0)
>    will return a new array formed from the given object type given.
>    Object can be anything with an __array__ method, or any object
>    exposing the array interface, or any (nested) sequence.
>    If no type is given, then the type will be determined as the
>    minimum type required to hold the objects in the sequence.
>    If copy is zero and sequence is already an array with the right
>    type, a reference will be returned.  If the sequence is an array,
>    type can be used only to upcast the array.  For downcasting
>    use .astype(t) method.  If subok is true, then subclasses of the
>    array may be returned. Otherwise, a base-class ndarray is returned
>    The ndmin argument specifies how many dimensions the returned
>    array should have as a minimum. 1's will be pre-pended to the
>    shape as needed to meet this requirement.
> [Dbg]>>> help(_n.asarray)
> Help on function asarray in module numpy.core.numeric:
>
> asarray(a, dtype=None, fortran=False, ndmin=0)
>    returns a as an array.  Unlike array(),
>    no copy is performed if a is already an array.  Subclasses are 
> converted
>    to base class ndarray.
>
> It is not clear, from this, just what is acceptable, for a, by asarray.

That could probably be clearer, it's true.

>
> From your response, it would appear that "a" in asarray is the same as 
> "object" in array.


Indeed.

>
>> Let me go into a little more detail on the first and second points. 
>> The common use case for asarray and ascontiguous is to adapt objects 
>> that you don't know much about into arrays. A typical pattern is:
>>
>> def func(a, b, c):
>>    a = asarray(a)
>>    b = ascontiguous(b)
>>    c = asarray(c)
>>    # Code that requires arrays for a,c and a contigous array for b.
>>
>> For this reason, as contiguous needs to take any object and attempt 
>> to turn it into a contiguous array if it isn't one already. So "b = 
>> ascontiguous(a)" is really equivalent to "b = asarray(a); b = b if 
>> b.flags.contiguous else b.copy()".  Even if we could assume all the 
>> inputs were arrays, or we separately use as array first, the pattern 
>> you propose is error prone since it violates DRY. It would be all to 
>> easy to write "b= a if c.flags.contiguous else a.copy()" or some 
>> such. In order to verify that the expressions are correct, or in 
>> order to tell what they actually do you have to actually parse 
>> through the expression. "ascontiguous", on the other hand, is pretty 
>> much bomb proof once you know what it does.
>>
>>
> I would be grateful if you could clarify DRY.


Don't Repeat Yourself. Basically, it's exhortation not to write out the 
same expression over and over again. It was on of the main motivators 
for getting += into Python, since:

    a.foo.bar[squiggly].retrieveArrayFor("Colin") =
    a.foo.bar[squiggly].retrieveArrayFor("Colin") +
    rubber_ducky.frobinate["NOW!"]

Is much easier to mess up than:

    a.foo.bar[squiggly].retrieveArrayFor("Colin") +=
    rubber_ducky.frobinate["NOW!"]

In the former, it's easy to miss small differences between the first and 
second quantities. These creep in easily when cutting and pasting. 
Anyway, in the case in question:

    b= a if a.flags.contiguous else a.copy()

'a' is repeated twice. If this is getting cut and pasted all over the 
place it'll frequently end up that one of the a's doesn't get changed 
properly. For:

    b = ascontiguous(a)

The chances are at least half as small that one gets it wrong based on 
numbers of chances alone. Probably much, much better than that in 
practice since a isn't  buried in an expression, but is nicely set off 
by parentheses.

See also: http://en.wikipedia.org/wiki/Don%27t_repeat_yourself

>
>>
>>>
>>> numarray has an iscontiguous method, which is a bit simpler than 
>>> concerning the Python user with flags.
>>>
>>>>
>>>>> In numarray, transpose created a discontiguous array.  copy() made 
>>>>> the array contiguous.  Is there sufficient benefit to justify 
>>>>> another method?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I was proposing a function, not a method. The array object has 
>>>> plenty of methods already.
>>>
>>>
>>>
>>>
>>> Since the function would only operate on ArrayType instances, would 
>>> it not be better as a method?
>>
>>
>>
>> Your premise is wrong. It would operate on any object, so it has to 
>> be a function. It could also be a method, but that would be superfluous.
>>
>> Regards,
>>
>> -tim
>>
> It would appear that you are right, ascontiguous will be permitted to 
> operate on any object which is acceptable to array, although I can't 
> think of a case where any object other than an array, or a sub-class 
> could be discontiguous.

No, that is correct. The point is I have an object I obtained from 
somewhere. I need a contiguous array. ascontiguous applied to that 
object will either give me a contiguous array or raise an exception. 
Thus it can be used to adapt any thing-that-can-be-turned-into-an-array 
into a contiguous array.

>
> Thanks for clarifying things.

Not a problem.

Regards,

Tim





More information about the NumPy-Discussion mailing list