[Numpy-discussion] Request for review: dynamic_cpu_branch

Fri Dec 26 02:17:16 EST 2008

On Fri, Dec 26, 2008 at 12:05 AM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau <cournape at gmail.com>wrote:
>
>> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau
>> > <david at ar.media.kyoto-u.ac.jp> wrote:
>> >>
>> >> David Cournapeau wrote:
>> >> > Charles R Harris wrote:
>> >> >
>> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau <
>> cournape at gmail.com
>> >> >> <mailto:cournape at gmail.com>> wrote:
>> >> >>
>> >> >>     On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern
>> >> >>     <robert.kern at gmail.com <mailto:robert.kern at gmail.com>> wrote:
>> >> >>
>> >> >>     >
>> >> >>     > I think he meant that it can be discovered at runtime in
>> >> >>     general, not
>> >> >>     > at numpy-run-time, so we can write a small C program that can
>> be
>> >> >> run
>> >> >>     > at numpy-build-time to add another entry to config.h.
>> >> >>
>> >> >>     But then we only move the problem: people who want to build
>> >> >> universal
>> >> >>     numpy extensions will have the wrong value, no ? The fundamental
>> >> >> point
>> >> >>     of my patch is that the value is set whenever ndarrayobject.h is
>> >> >>     included. So even if I build numpy on PPC, NPY_BIGENDIAN will
>> not
>> >> >> be
>> >> >>     defined when the header is included for a file build with gcc
>> -arch
>> >> >>     i386.
>> >> >>
>> >> >>
>> >> >> We can probably set things up so the determination is at run time --
>> >> >> but we need to be sure that the ABI isn't affected. I did that once
>> >> >> for an old project that needed data portability. In any case, it
>> >> >> sounds like a project for a later release.
>> >> >>
>> >> >
>> >> > It cannot work for numpy without breaking backward compatibility,
>> >> > because of the following lines:
>> >> >
>> >>
>> >> Actually, you could, by making the macro point to actual functions, but
>> >> that would add function call cost. I don't know if the function call
>> >> cost is significant or not in the cases where those macro are used,
>> >
>> > Exactly. Function calls are pretty cheap on modern hardware with good
>> > compilers, nor would I expect the calls to be the bottleneck in most
>> > applications. The functions would need to be visible to third party
>> > applications, however...
>>
>> Would it be a problem ? Adding "true" functions to the array api,
>> while keeping the macro for backward compatibility should be ok, no ?
>>
>
> I don't think it's a problem, just that the macros generate code that is
> compiled, so they need to call an api function. A decent compiler will
> probably load the function pointer somewhere fast if it is called in a loop,
> a const keyword somewhere will help with that. We might want something more
> convenient for our own code.
>
>
>>
>> I also updated my patch, with another function PyArray_GetEndianness
>> which detects the runtime endianness (using an union int/char[4]). The
>> point is to detect any mismatch between the configuration endianness
>> and the "true" one, and I put the detection in import_array. The
>> function is in the numpy array API, but it does not really need to be
>> either .
>>
>
> That sounds like a good start. It might be a good idea to use something
> like npy_int32 instead of a plain old integer. Likewise, it would probably
> be good to define the union as an anonymous constant. Hmm...
> something like:
>
> #include <stdio.h>
>
> const union {
>     int i;
>     char c[4];
> } order = {1};
>
> const i = 1;
>
> int main(int argc, char **argv)
> {
>     if (order.c[0])
>         printf("little endian\n");
>     else
>         printf("big endian\n");
>
>     if (*(char*)&i)
>         printf("little endian\n");
>     else
>         printf("big endian\n");
>
>     return 0;
> }
>
> I've done it two ways here. They both require the -fno-strict-aliasing flag
> in gcc, but numpy is compiled with that flag. Both methods generate the same
> assembly with -O2 on my intel core2.
>
>     ...
>     cmpb    $0, order
>     ...
>     cmpb    $0, i
>

I suppose we could also mark one of those variables as static, make the name
more unique, and stick it in the include file, thus avoiding the need to add
anything to the api. Not the cleanest solution, but maybe better...

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20081226/479c3b27/attachment.html>