[Numpy-discussion] Request for review: dynamic_cpu_branch

Fri Dec 26 20:47:24 EST 2008

On Sat, Dec 27, 2008 at 2:38 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Fri, Dec 26, 2008 at 2:20 AM, David Cournapeau <cournape at gmail.com>
> wrote:
>>
>> On Fri, Dec 26, 2008 at 4:05 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau <cournape at gmail.com>
>> > wrote:
>> >>
>> >> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris
>> >> <charlesr.harris at gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau
>> >> > <david at ar.media.kyoto-u.ac.jp> wrote:
>> >> >>
>> >> >> David Cournapeau wrote:
>> >> >> > Charles R Harris wrote:
>> >> >> >
>> >> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau
>> >> >> >> <cournape at gmail.com
>> >> >> >> <mailto:cournape at gmail.com>> wrote:
>> >> >> >>
>> >> >> >>     On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern
>> >> >> >>     <robert.kern at gmail.com <mailto:robert.kern at gmail.com>> wrote:
>> >> >> >>
>> >> >> >>     >
>> >> >> >>     > I think he meant that it can be discovered at runtime in
>> >> >> >>     general, not
>> >> >> >>     > at numpy-run-time, so we can write a small C program that
>> >> >> >> can
>> >> >> >> be
>> >> >> >> run
>> >> >> >>     > at numpy-build-time to add another entry to config.h.
>> >> >> >>
>> >> >> >>     But then we only move the problem: people who want to build
>> >> >> >> universal
>> >> >> >>     numpy extensions will have the wrong value, no ? The
>> >> >> >> fundamental
>> >> >> >> point
>> >> >> >>     of my patch is that the value is set whenever ndarrayobject.h
>> >> >> >> is
>> >> >> >>     included. So even if I build numpy on PPC, NPY_BIGENDIAN will
>> >> >> >> not
>> >> >> >> be
>> >> >> >>     defined when the header is included for a file build with gcc
>> >> >> >> -arch
>> >> >> >>     i386.
>> >> >> >>
>> >> >> >>
>> >> >> >> We can probably set things up so the determination is at run time
>> >> >> >> --
>> >> >> >> but we need to be sure that the ABI isn't affected. I did that
>> >> >> >> once
>> >> >> >> for an old project that needed data portability. In any case, it
>> >> >> >> sounds like a project for a later release.
>> >> >> >>
>> >> >> >
>> >> >> > It cannot work for numpy without breaking backward compatibility,
>> >> >> > because of the following lines:
>> >> >> >
>> >> >>
>> >> >> Actually, you could, by making the macro point to actual functions,
>> >> >> but
>> >> >> that would add function call cost. I don't know if the function call
>> >> >> cost is significant or not in the cases where those macro are used,
>> >> >
>> >> > Exactly. Function calls are pretty cheap on modern hardware with good
>> >> > compilers, nor would I expect the calls to be the bottleneck in most
>> >> > applications. The functions would need to be visible to third party
>> >> > applications, however...
>> >>
>> >> Would it be a problem ? Adding "true" functions to the array api,
>> >> while keeping the macro for backward compatibility should be ok, no ?
>> >
>> > I don't think it's a problem, just that the macros generate code that is
>> > compiled, so they need to call an api function. A decent compiler will
>> > probably load the function pointer somewhere fast if it is called in a
>> > loop,
>> > a const keyword somewhere will help with that. We might want something
>> > more
>> > convenient for our own code.
>> >
>> >>
>> >> I also updated my patch, with another function PyArray_GetEndianness
>> >> which detects the runtime endianness (using an union int/char[4]). The
>> >> point is to detect any mismatch between the configuration endianness
>> >> and the "true" one, and I put the detection in import_array. The
>> >> function is in the numpy array API, but it does not really need to be
>> >> either .
>> >
>> > That sounds like a good start. It might be a good idea to use something
>> > like
>> > npy_int32 instead of a plain old integer. Likewise, it would probably be
>> > good to define the union as an anonymous constant. Hmm...
>> > something like:
>>
>> What I did was a bit more heavyweight, because I added it as function,
>> but the idea is similar:
>>
>> static int compute_endianness()
>> {
>>        union {
>>                char c[4];
>>                npy_uint32 i;
>>        } bint;
>>        int st;
>>
>>        bint.i = 'ABCD';
>>
>>        st = bintstrcmp(bint.c, "ABCD");
>>        if (st == 0) {
>>                return NPY_CPU_BIG;
>>        }
>>        st = bintstrcmp(bint.c, "DCBA");
>>        if (st == 0) {
>>                return NPY_CPU_LITTLE;
>>        }
>>        return NPY_CPU_UNKNOWN_ENDIAN;
>> }
>>
>> Now that I think about it, I don't know if setting an integer to
>> 'ABCD' is legal C. I think it is, but I don't claim to be any kind of
>> C expert.
>
> Try
>
>      const  union {
>                char c[4];
>                npy_uint32 i;
>        } bint = {1,2,3,4};
>
> And just compare the resulting value of bint.i to predefined values, i.e.,
> 67305985 for big endian. The initializer is needed for the const union and
> initializes the first variable. The result will be more efficient as the
> union is initialized at compile time.

I can do that, indeed.

>
>>
>>
>> >
>> >
>> > I've done it two ways here. They both require the -fno-strict-aliasing
>> > flag
>> > in gcc, but numpy is compiled with that flag. Both methods generate the
>> > same
>> > assembly with -O2 on my intel core2.
>>
>> I don't need this flag in my case, and I don't think we should require
>> it if we can avoid it.
>
> You can't avoid it, there will be a compile time error if you are lucky or
> buggy code if you aren't.

I don't understand why. The whole point of union in that case is to
make it clear that type punning is ok, since by definition the two
values start at the same address. If this broke aliasing, any union
would. Here is what man gcc tells me under the -fstrict-aliasing:

"""
 Allows the compiler to assume the strictest aliasing rules applicable
to the language being compiled.  For C (and C++), this activates
optimizations based on the type of expressions.  In particular, an
object of one type is assumed never to reside at the same address as
anbobject of a different type, unless the types are almost the same.
For example, an "unsigned int" can alias an "int", but not a "void*"
or ab"double".  A character type may alias any other type.

Pay special attention to code like this:

                   union a_union {
                     int i;
                     double d;
                   };

                   int f() {
                     a_union t;
                     t.d = 3.0;
                     return t.i;
                   }

The practice of reading from a different union member than the one
most recently written to (called ``type-punning'') is common.  Even
withb-fstrict-aliasing, type-punning is allowed, provided the memory
is accessed through the union type.  So, the code above will work as
expected.

However, this code might not:

                   int f() {
                     a_union t;
                     int* ip;
                     t.d = 3.0;
                     ip = &t.i;
                     return *ip;
                   }
"""

David