[Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013)

Wed Mar 6 17:05:22 EST 2013

On Wed, 2013-03-06 at 12:42 -0600, Kurt Smith wrote:
> On Wed, Mar 6, 2013 at 12:12 PM, Kurt Smith <kwmsmith at gmail.com> wrote:
> > On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted <francesc at continuum.io> wrote:
> >>
> >> I would not run too much.  The example above takes 9 bytes to host the
> >> structure, while a `aligned=True` will take 16 bytes.  I'd rather let
> >> the default as it is, and in case performance is critical, you can
> >> always copy the unaligned field to a new (homogeneous) array.
> >
> > Yes, I can absolutely see the case you're making here, and I made my
> > "vote" with the understanding that `aligned=False` will almost
> > certainly stay the default.  Adding 'aligned=True' is simple for me to
> > do, so no harm done.
> >
> > My case is based on what's the least surprising behavior: C structs /
> > all C compilers, the builtin `struct` module, and ctypes `Structure`
> > subclasses all use padding to ensure aligned fields by default.  You
> > can turn this off to get packed structures, but the default behavior
> > in these other places is alignment, which is why I was surprised when
> > I first saw that NumPy structured dtypes are packed by default.
> >
> 
> Some surprises with aligned / unaligned arrays:
> 
> #-----------------------------
> 
> import numpy as np
> 
> packed_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=False)
> aligned_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=True)
> 
> packed_arr = np.ones((10**6,), dtype=packed_dt)
> aligned_arr = np.ones((10**6,), dtype=aligned_dt)
> 
> print "all(packed_arr['a'] == aligned_arr['a'])",
> np.all(packed_arr['a'] == aligned_arr['a']) # True
> print "all(packed_arr['b'] == aligned_arr['b'])",
> np.all(packed_arr['b'] == aligned_arr['b']) # True
> print "all(packed_arr == aligned_arr)", np.all(packed_arr ==
> aligned_arr) # False (!!)
> 
> #-----------------------------
> 
> I can understand what's likely going on under the covers that makes
> these arrays not compare equal, but I'd expect that if all columns of
> two structured arrays are everywhere equal, then the arrays themselves
> would be everywhere equal.  Bug?
> 

Yes and no... equal for structured types seems not implemented, you get
the same (wrong) False also with (packed_arr == packed_arr). But if the
types are equivalent but np.equal not implemented, just returning False
is a bit dangerous I agree. Not sure what the solution is exactly, I
think the == operator could really raise an error instead of eating them
all though probably...

- Sebastian

> And regarding performance, doing simple timings shows a 30%-ish
> slowdown for unaligned operations:
> 
> In [36]: %timeit packed_arr['b']**2
> 100 loops, best of 3: 2.48 ms per loop
> 
> In [37]: %timeit aligned_arr['b']**2
> 1000 loops, best of 3: 1.9 ms per loop
> 
> Whereas summing shows just a 10%-ish slowdown:
> 
> In [38]: %timeit packed_arr['b'].sum()
> 1000 loops, best of 3: 1.29 ms per loop
> 
> In [39]: %timeit aligned_arr['b'].sum()
> 1000 loops, best of 3: 1.14 ms per loop
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>