[Numpy-discussion] object array alignment issues

Thu Oct 15 12:40:01 EDT 2009

I recently committed a regression test and bugfix for object pointers in 
record arrays of unaligned size (meaning where each record is not a 
multiple of sizeof(PyObject **)).

For example:

        a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
        a2 = np.zeros((10,), 'S10')
        # This copying would segfault
        a1['o'] = a2

http://projects.scipy.org/numpy/ticket/1198

Unfortunately, this unit test has opened up a whole hornet's nest of 
alignment issues on Solaris.  The various reference counting functions 
(PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, 
for instance.  Interestingly, there are comments in there saying 
"handles misaligned data" (eg. line 190), but in fact it doesn't, and 
doesn't look to me like it would.  But I won't rule out a mistake in 
building it on my part.

So, how to fix this? 

One obvious workaround is for users to pass "align=True" to the dtype 
constructor.  This works if the dtype descriptor is a dictionary or 
comma-separated string.  Is there a reason it couldn't be made to work 
with the string-of-tuples form that I'm missing?  It would be marginally 
more convenient from my application, but that's just a finesse issue.

However, perhaps we should try to fix the underlying alignment 
problems?  Unfortunately, it's not clear to me how to resolve them 
without at least some performance penalty.  You either do an alignment 
check of the pointer, and then memcpy if unaligned, or just always use 
memcpy.  Not sure which is faster, as memcpy may have a fast path 
already. These are object arrays anyway, so there's plenty of overhead 
already, and I don't think this would affect regular numerical arrays. 

If we choose not to fix it, perhaps we should we try to warn when 
creating an unaligned recarray on platforms where it matters?  I do 
worry about having something that works perfectly well on one platform 
fail on another.

In the meantime, I'll just mark the new regression test to "skip on 
Solaris".

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA