[Numpy-discussion] Request for review: dynamic_cpu_branch

Fri Dec 26 04:20:26 EST 2008

On Fri, Dec 26, 2008 at 4:05 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau <cournape at gmail.com>
> wrote:
>>
>> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau
>> > <david at ar.media.kyoto-u.ac.jp> wrote:
>> >>
>> >> David Cournapeau wrote:
>> >> > Charles R Harris wrote:
>> >> >
>> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau
>> >> >> <cournape at gmail.com
>> >> >> <mailto:cournape at gmail.com>> wrote:
>> >> >>
>> >> >>     On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern
>> >> >>     <robert.kern at gmail.com <mailto:robert.kern at gmail.com>> wrote:
>> >> >>
>> >> >>     >
>> >> >>     > I think he meant that it can be discovered at runtime in
>> >> >>     general, not
>> >> >>     > at numpy-run-time, so we can write a small C program that can
>> >> >> be
>> >> >> run
>> >> >>     > at numpy-build-time to add another entry to config.h.
>> >> >>
>> >> >>     But then we only move the problem: people who want to build
>> >> >> universal
>> >> >>     numpy extensions will have the wrong value, no ? The fundamental
>> >> >> point
>> >> >>     of my patch is that the value is set whenever ndarrayobject.h is
>> >> >>     included. So even if I build numpy on PPC, NPY_BIGENDIAN will
>> >> >> not
>> >> >> be
>> >> >>     defined when the header is included for a file build with gcc
>> >> >> -arch
>> >> >>     i386.
>> >> >>
>> >> >>
>> >> >> We can probably set things up so the determination is at run time --
>> >> >> but we need to be sure that the ABI isn't affected. I did that once
>> >> >> for an old project that needed data portability. In any case, it
>> >> >> sounds like a project for a later release.
>> >> >>
>> >> >
>> >> > It cannot work for numpy without breaking backward compatibility,
>> >> > because of the following lines:
>> >> >
>> >>
>> >> Actually, you could, by making the macro point to actual functions, but
>> >> that would add function call cost. I don't know if the function call
>> >> cost is significant or not in the cases where those macro are used,
>> >
>> > Exactly. Function calls are pretty cheap on modern hardware with good
>> > compilers, nor would I expect the calls to be the bottleneck in most
>> > applications. The functions would need to be visible to third party
>> > applications, however...
>>
>> Would it be a problem ? Adding "true" functions to the array api,
>> while keeping the macro for backward compatibility should be ok, no ?
>
> I don't think it's a problem, just that the macros generate code that is
> compiled, so they need to call an api function. A decent compiler will
> probably load the function pointer somewhere fast if it is called in a loop,
> a const keyword somewhere will help with that. We might want something more
> convenient for our own code.
>
>>
>> I also updated my patch, with another function PyArray_GetEndianness
>> which detects the runtime endianness (using an union int/char[4]). The
>> point is to detect any mismatch between the configuration endianness
>> and the "true" one, and I put the detection in import_array. The
>> function is in the numpy array API, but it does not really need to be
>> either .
>
> That sounds like a good start. It might be a good idea to use something like
> npy_int32 instead of a plain old integer. Likewise, it would probably be
> good to define the union as an anonymous constant. Hmm...
> something like:

What I did was a bit more heavyweight, because I added it as function,
but the idea is similar:

static int compute_endianness()
{
        union {
                char c[4];
                npy_uint32 i;
        } bint;
        int st;

        bint.i = 'ABCD';

        st = bintstrcmp(bint.c, "ABCD");
        if (st == 0) {
                return NPY_CPU_BIG;
        }
        st = bintstrcmp(bint.c, "DCBA");
        if (st == 0) {
                return NPY_CPU_LITTLE;
        }
        return NPY_CPU_UNKNOWN_ENDIAN;
}

Now that I think about it, I don't know if setting an integer to
'ABCD' is legal C. I think it is, but I don't claim to be any kind of
C expert.

>
>
> I've done it two ways here. They both require the -fno-strict-aliasing flag
> in gcc, but numpy is compiled with that flag. Both methods generate the same
> assembly with -O2 on my intel core2.

I don't need this flag in my case, and I don't think we should require
it if we can avoid it.

I also use strings comparison, not just the first character, because
in theory, there can be middle endian, but I doubt this has any real
use. In that case, the function bintstrcmp can be dropped.

David