[SciPy-User] Uniquely identify array

Tue Jul 19 12:37:27 EDT 2011

I know you were looking for tools to answer the question and not
answers to the question but:

The easiest way to do what you want is:

output[...] = D*M-O

This will convert D to floats, multiply it by M, subtract O, then
store the result into output, converting to ints on the fly. I'm not
sure whether a floatified version of D is allocated, but I think so.
You could do all this in-place at the cost of extra roundings by using
the np.multiply(a,b,out) forms of ufuncs.

Anne

On 19 July 2011 12:30, Chris Weisiger <cweisiger at msg.ucsf.edu> wrote:
> Thanks for the detailed response, both you and Robert Kern. My
> immediate problem is not especially significant; I have three arrays:
> one of data D, one of additive offsets O, and one of multiplicative
> modifiers M. The first is of ints and the latter two of floats, and I
> want to get D * M - O as ints and shove them into an existing buffer.
> This is not an especially expensive operation (the dataset is
> 512x512), but I found myself curious about what it's doing behind the
> scenes, and the most straightforward way I know of to track that kind
> of thing is to track allocations. I don't expect it would make a big
> difference to hyper-optimize this problem, but in the future I may
> need tighter code in some other application, and I'd rather know now
> than potentially go down a wrong path later.
>
> I know more now about what Numpy's doing than I did before this
> thread. Thanks for the prompt and detailed responses. :)
>
> -Chris
>
> On Tue, Jul 19, 2011 at 9:22 AM, Anne Archibald
> <aarchiba at physics.mcgill.ca> wrote:
>> This is a little more subtle than it sounds. Most python objects can
>> be compared for identity with "is" (e.g. "if x is None:"). This tests
>> for pointer equality, that is, it confirms that you have the same
>> dynamically-allocated heap object. This will work for arrays, but it
>> might be too specific for what you want: a numpy array actually
>> consists of two heap objects, a python object that describes the
>> array, and a memory arena. Slicing operations like A[::-1] are fast
>> because while they create a new python object, the memory arena is
>> untouched. So you need to decide whether what you care about is any
>> change at all to the array, or whether what you care about is whether
>> a new memory arena has been allocated.
>>
>> A brief aside: people often think they care about allocation of new
>> arrays, but in most cases they're mistaken. malloc() is an extremely
>> fast operation, especially for large arrays, in which case it's
>> usually a direct call to the OS's mmap (and free really does free the
>> memory back to the system). If what you're worried about is that your
>> code is slower than it should be, making sure there are no extra
>> allocations is not the best place to look. In-place operations have
>> their own limitations, things like cache-coherency issues and cache
>> efficiency of strided memory access. This is not theoretical: I had
>> some code, a few years ago, that manipulated large arrays and was
>> slow. So I painstakingly went through and made it use in-place
>> operations where possible and avoid malloc()ing new arrays. Not only
>> did it get slower, the memory usage increased.
>>
>> On the other hand, if you want to know whether you're getting slices
>> that allow you to modify the original array or freshly-allocated
>> arenas, the bluntest available instrument is to write to the one and
>> see if the other changes. There are some more subtle approaches that
>> are a little approximate, things like checking the address of the
>> memory arena, or the equality of the base numpy array object (be
>> warned that you have to traverse a tree of up pointers to get this
>> last). I say approximate because while A[::2] and A[1::2] share a
>> memory arena, and even have overlapping extents, you can modify them
>> independently of each other.
>>
>> In short, you need to think hard about exactly what you're testing
>> for. But for unit tests I recommend using modifications to test for
>> memory sharing.
>>
>> Anne
>>
>> On 19 July 2011 12:04, Chris Weisiger <cweisiger at msg.ucsf.edu> wrote:
>>> Is there some way in Python to uniquely identify a given Numpy array?
>>> E.g. to get a pointer to its location in memory or something similar?
>>> I'm looking for some way to determine which operations will implicitly
>>> create new arrays, just to verify that I'm not doing anything that
>>> will seriously hurt my performance -- but this seems like something
>>> that would be generally useful to know.
>>>
>>> Unfortunately ndarrays don't allow arbitrary additions to their
>>> namespace; no doing "foo.myUniqueIdentifier = 1", for example.
>>>
>>> Thanks in advance!
>>>
>>> -Chris
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>