[Numpy-discussion] Aligning an array on Windows

Francesc Altet faltet at carabos.com
Fri May 25 14:39:13 EDT 2007


A Dijous 24 Maig 2007 20:33, Francesc Altet escrigué:
> Hi,
>
> Some time ago I made an improvement in speed on the numexpr version of
> PyTables so as to accelerate the operations with unaligned arrays
> (objects that can appear quite commonly when dealing with columns of
> recarrays, as PyTables does).
>
> This improvement has demostrated to work correctly and flawlessly in
> Linux machines (using GCC 4.x and in both 32-bit and 64-bit Linux boxes)
> for several weeks of intensive testing.  Moreover, its speed-up is
> ranging from a 40% on modern processors and up to a 70% in older ones,
> so I'd like to keep it.
>
> The surprise came today when I tried to compile the same code on a
> Windows box (Win XP Service Pack 2) using MSVC 7.1, through the free (as
> in beer) Toolkit 2003.  The compilation process went fine, but the
> problem is that I'm getting crashes from time to time when running the
> numexpr test suite.
>
> After some in-depth investigation, I'm pretty sure that the problem is
> in a concrete part of the code that I'd modified for this improvement.
> IMO, the affected code is in numexpr/interp_body.c and reads like:
>
>     case OP_COPY_II: VEC_ARG1(memcpy(dest, x1, sizeof(int));
>                               dest += sizeof(int); x1 += stride1);
>     case OP_COPY_LL: VEC_ARG1(memcpy(dest, x1, sizeof(long long));
>                               dest += sizeof(long long); x1 += stride1);
>     case OP_COPY_FF: VEC_ARG1(memcpy(dest, x1, sizeof(double));
>                               dest += sizeof(double); x1 += stride1);
>     case OP_COPY_CC: VEC_ARG1(memcpy(dest, x1, sizeof(double)*2);
>                               dest += sizeof(double)*2; x1 += stride1);
[snip]

Just for the record: I've found the culprit.  The problem here was the use of 
the stride1 variable that was declared just above the main switch for opcodes 
as:

intp stride1 = params.memsteps[arg1]; 

Unfortunately, this assignment gave problems because arg1 can take values out 
of the range of memsteps array.  The solution was to use another variable, 
that was initialized as:

intp sb1 = params.memsteps[arg1];

but in the VEC_ARG* macros, just after the BOUNDS_CHECK(arg1) call, so that it 
checks that arg1 doesn't get out of range.  All in all, a very subtle bug 
that would have evident for Numexpr main authors, but not for me. Anyway,  
you can find the details of the fix in:
http://www.pytables.org/trac/changeset/2939

I don't know exactly why, this wasn't giving problems with Linux boxes.  
Fortunately, Windows platform is much more finicky in terms of memory 
problems and brought this bug to my attention.  Oh, thanks god for letting 
Windows be! ;)

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



More information about the NumPy-Discussion mailing list