[Numpy-discussion] Profiling numpy ? (parts written in C)

Charles R Harris charlesr.harris at gmail.com
Wed Dec 20 17:22:54 EST 2006


On 12/20/06, Francesc Altet <faltet at carabos.com> wrote:
>
> A Dimecres 20 Desembre 2006 03:36, David Cournapeau escrigué:
> > Francesc Altet wrote:
> > > A Dimarts 19 Desembre 2006 08:12, David Cournapeau escrigué:
> > >> Hi,
> > >>


<snip>

@fname at _copyswap (void *dst, void *src, int swap, void *arr)
> {
>
>          if (src != NULL) /* copy first if needed */
>                 memcpy(dst, src, sizeof(@type@));
>
> [where the numpy code generator is replacing @fname@ by DOUBLE]
>
> we see that memcpy is called under the hood (I don't know why oprofile
> is not able to detect this call anymore).
>
> After looking at the function, and remembering what Charles Harris
> said in a previous message about the convenience to use a simple type
> specific assignment, I've ended replacing the memcpy. Here it is the
> patch:
>
> --- numpy/core/src/arraytypes.inc.src   (revision 3487)
> +++ numpy/core/src/arraytypes.inc.src   (working copy)
> @@ -997,11 +997,11 @@
> }
>
> static void
> - at fname@_copyswap (void *dst, void *src, int swap, void *arr)
> + at fname@_copyswap (@type@ *dst, @type@ *src, int swap, void *arr)
> {
>
>          if (src != NULL) /* copy first if needed */
> -                memcpy(dst, src, sizeof(@type@));
> +                *dst = *src;
>
>          if (swap) {
>                  register char *a, *b, c;


We could get rid of the register keyword too, it is considered obsolete
these days.  Also, for most architectures

#if SIZEOF_ at fsize@ == 4
                b = a + 3;
                c = *a; *a++ = *b; *b-- = c;
                c = *a; *a++ = *b; *b   = c;

will be notably slower than

#if SIZEOF_ at fsize@ == 4
                c = a[0]; a[0] = a[3]; a[3] = c;
                c = a[1]; a[1] = a[2]; a[2] = c;

because loading the indexed addresses is a single instruction if a is in a
register.

Inlining would also be good, but can be tricky and compiler dependent. If
all the code is in one big chunk, things aren't so bad and a simple inline
directive should do the trick. We would also want to break the subroutine up
into smaller pieces so that the common case was inlined and the more
complicated cases remained function calls.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20061220/babbbe8d/attachment.html>


More information about the NumPy-Discussion mailing list