[SciPy-User] Uniquely identify array

Sebastian Haase seb.haase at gmail.com
Tue Jul 19 16:04:13 EDT 2011


To get the two operations done in one step without intermediate
temporary, you should benefit from using
numexpr.
There is not much talk about   http://code.google.com/p/numexpr   anymore,
but it got started out of discussions on this list.
And since then, it now even supports float32 (not only float64), which
is what you want for large image data sets.

I always meant to use it myself ....

Cheers,
Sebastian Haase



On Tue, Jul 19, 2011 at 6:37 PM, Anne Archibald
<aarchiba at physics.mcgill.ca> wrote:
> I know you were looking for tools to answer the question and not
> answers to the question but:
>
> The easiest way to do what you want is:
>
> output[...] = D*M-O
>
> This will convert D to floats, multiply it by M, subtract O, then
> store the result into output, converting to ints on the fly. I'm not
> sure whether a floatified version of D is allocated, but I think so.
> You could do all this in-place at the cost of extra roundings by using
> the np.multiply(a,b,out) forms of ufuncs.
>
> Anne
>
> On 19 July 2011 12:30, Chris Weisiger <cweisiger at msg.ucsf.edu> wrote:
>> Thanks for the detailed response, both you and Robert Kern. My
>> immediate problem is not especially significant; I have three arrays:
>> one of data D, one of additive offsets O, and one of multiplicative
>> modifiers M. The first is of ints and the latter two of floats, and I
>> want to get D * M - O as ints and shove them into an existing buffer.
>> This is not an especially expensive operation (the dataset is
>> 512x512), but I found myself curious about what it's doing behind the
>> scenes, and the most straightforward way I know of to track that kind
>> of thing is to track allocations. I don't expect it would make a big
>> difference to hyper-optimize this problem, but in the future I may
>> need tighter code in some other application, and I'd rather know now
>> than potentially go down a wrong path later.
>>
>> I know more now about what Numpy's doing than I did before this
>> thread. Thanks for the prompt and detailed responses. :)
>>
>> -Chris
>>
>> On Tue, Jul 19, 2011 at 9:22 AM, Anne Archibald
>> <aarchiba at physics.mcgill.ca> wrote:
>>> This is a little more subtle than it sounds. Most python objects can
>>> be compared for identity with "is" (e.g. "if x is None:"). This tests
>>> for pointer equality, that is, it confirms that you have the same
>>> dynamically-allocated heap object. This will work for arrays, but it
>>> might be too specific for what you want: a numpy array actually
>>> consists of two heap objects, a python object that describes the
>>> array, and a memory arena. Slicing operations like A[::-1] are fast
>>> because while they create a new python object, the memory arena is
>>> untouched. So you need to decide whether what you care about is any
>>> change at all to the array, or whether what you care about is whether
>>> a new memory arena has been allocated.
>>>
>>> A brief aside: people often think they care about allocation of new
>>> arrays, but in most cases they're mistaken. malloc() is an extremely
>>> fast operation, especially for large arrays, in which case it's
>>> usually a direct call to the OS's mmap (and free really does free the
>>> memory back to the system). If what you're worried about is that your
>>> code is slower than it should be, making sure there are no extra
>>> allocations is not the best place to look. In-place operations have
>>> their own limitations, things like cache-coherency issues and cache
>>> efficiency of strided memory access. This is not theoretical: I had
>>> some code, a few years ago, that manipulated large arrays and was
>>> slow. So I painstakingly went through and made it use in-place
>>> operations where possible and avoid malloc()ing new arrays. Not only
>>> did it get slower, the memory usage increased.
>>>
>>> On the other hand, if you want to know whether you're getting slices
>>> that allow you to modify the original array or freshly-allocated
>>> arenas, the bluntest available instrument is to write to the one and
>>> see if the other changes. There are some more subtle approaches that
>>> are a little approximate, things like checking the address of the
>>> memory arena, or the equality of the base numpy array object (be
>>> warned that you have to traverse a tree of up pointers to get this
>>> last). I say approximate because while A[::2] and A[1::2] share a
>>> memory arena, and even have overlapping extents, you can modify them
>>> independently of each other.
>>>
>>> In short, you need to think hard about exactly what you're testing
>>> for. But for unit tests I recommend using modifications to test for
>>> memory sharing.
>>>
>>> Anne
>>>
>>> On 19 July 2011 12:04, Chris Weisiger <cweisiger at msg.ucsf.edu> wrote:
>>>> Is there some way in Python to uniquely identify a given Numpy array?
>>>> E.g. to get a pointer to its location in memory or something similar?
>>>> I'm looking for some way to determine which operations will implicitly
>>>> create new arrays, just to verify that I'm not doing anything that
>>>> will seriously hurt my performance -- but this seems like something
>>>> that would be generally useful to know.
>>>>
>>>> Unfortunately ndarrays don't allow arbitrary additions to their
>>>> namespace; no doing "foo.myUniqueIdentifier = 1", for example.
>>>>
>>>> Thanks in advance!
>>>>
>>>> -Chris



More information about the SciPy-User mailing list