[Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array

Nathaniel Smith njs at pobox.com
Tue Jul 16 11:55:58 EDT 2013


On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma <arinkverma at gmail.com> wrote:

> >Each ndarray does two mallocs, for the obj and buffer. These could be
> combined into 1 - just allocate the total size and do some pointer
> >arithmetic, then set OWNDATA to false.
> So, that two mallocs has been mentioned in project introduction. I got
> that wrong.
>

On further thought/reading the code, it appears to be more complicated than
that, actually.

It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1
for the array object itself, and one for the shapes + strides. And, one
call to regular-old malloc: for the data buffer.

(Mysteriously, shapes + strides together have 2*ndim elements, but to hold
them we allocate a memory region sized to hold 3*ndim elements. I'm not
sure why.)

And contrary to what I said earlier, this is about as optimized as it can
be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc,
because the shapes+strides may need to be resized without affecting the
much larger data area. But it's tempting to allocate the array object and
the data buffer in a single memory region, like I suggested earlier. And
this would ALMOST work. But, it turns out there is code out there which
assumes (whether wisely or not) that you can swap around which data buffer
a given PyArrayObject refers to (hi Theano!). And supporting this means
that data buffers and PyArrayObjects need to be in separate memory regions.

>magnitude more time in inefficient loop selection and unnecessary writes
> to the FP control word?
> loop selection, contribute around 2~3% in time. I implemented cache with PyThreadState_GetDict()
> but it didnt help.
> Even generating prepopulated dict/list in code_generator/generate_umath.py is
> not helping,
>
>
> Here, it the distribution of time, on addition operations. All memory
> related and BuildValue operations cost more than 7%, rest looping ones are
> around 2-3%:
>
>    - PyUFunc_AddititonTypeResolver(7.6%)
>    - *SimpleBinaryOperationTypeResolver(6.2%)*
>
>
>    - *execute_legacy_ufunc_loop(20.7%)*
>    - trivial_three_operand_loop(8.6%)  ,this will be around 3.4% when pr #
>       3521 <https://github.com/numpy/numpy/pull/3521> get merged
>       - *PYArray_NewFromDescr(7.3%)*
>       - PyUFunc_DefaultLegacyInnerLoopSelector(2.5%)
>
>
>    - PyUFunc_GetPyValues(12.0%)
>    - *_extract_pyvals(9.2%)*
>    - *PyArray_Return(14.3%)*
>
> Hmm, you prodded me into running those numbers again to see :-)

At http://www.arinkverma.in/2013/06/finding-bottleneck-in-pythonnumpy.htmlyou
say that you're using a Python compiled with --with-pydebug. Is this
true? If so then stop! You want numpy compiled with generic debugging
information ("-g" on gcc), and maybe it helps to have Python compiled with
"-g" as well. But --with-pydebug goes much further -- it actually changes
the Python interpreter in many ways to add lots of expensive self-checks.
On my machine simple operations like "[]" (allocate a list) or "1.0 + 1.0"
go about 4x slower when I use Ubuntu's python-dbg package (which is
compiled with --with-pydebug). You can't trust speed measurements you get
from a --with-pydebug build.

Anyway, I'm using 64-bit python2.7 from Ubuntu's repo, self-compiled numpy
master, with this measurement code:

import ctypes
profiler = ctypes.CDLL("libprofiler.so.0")
def loop(n):
    import numpy as np
    print "Numpy:", np.__version__
    x = np.asarray([1.0, 2.0])
    for i in xrange(n):
        x + x
profiler.ProfilerStart("/tmp/master-array-float64-add.prof")
loop(10000000)
profiler.ProfilerStop()

Graph attached.

Notice:
- because my benchmark has a 2-element array instead of a scalar array, the
special-case scalar return logic (PyArray_Return etc.) disappears. This
makes all percentages a bit higher in my graph, because the operation is
overall faster.

- PyArray_NewFromDescr does indeed take 11.6% of the time, but it's not
clear why. Half that time is directly inside PyArray_NewFromDescr, not in
any sub-calls to malloc-related functions. Also, you see a lot more time in
array_alloc than I do, which may be caused by --with-pydebug.

Taking a closer look with google-pprof --disasm=PyArray_NewFromDescr (also
attached), it looks like the major cost here is, bizarrely enough, the
calculation of the array size?! Out of 338 cumulative samples in this
function, I count 175 that are associated with various div/mul
instructions, while all the mallocs together take only 164 (= 5.6% of total
time).

This is pretty bizarre for a bunch of 1-dimensional 2-element arrays!?

- PyUFunc_AdditionTypeResolver takes 10.9% of the time, and
PyUFunc_DefaultLegacyInnerLoopSelector takes another 4.2% of the time, and
this pretty absurd considering that we're talking about locating the
float64 + float64 loop, which should not require any complicated logic.
This should be like 0.1% or something. I'm not surprised that
PyThreadState_GetDict() doesn't help -- doing dict lookups was probably was
more expensive than the thing you replaced! But some sort of simple table
lookup scheme that reduces loop lookup to chasing a few pointers should be
totally doable.

- We're spending 13.6% of the time in PyUFunc_getfperr. I'm pretty sure
that a lot of this is totally wasted time, because we implement both 'set'
and 'clear' operations as 'set+clear', making them twice as costly as
necessary.

(Eventually it would be even better if we could disable this logic entirely
for integer arrays, and for when the user has turned off fp error
reporting. But neither of these would help for this simple float+float
benchmark.)

- _extract_pyvals and PyUFunc_GetPyValues (not sure why they aren't linked
in my graph, but they seem to be the same code) together use >11% of time.
This is also completely silly -- all this time is spent on doing elaborate
stuff to look up entries in a python dict, extract them, and convert them
into, like, some C level bitmasks. And then doing that again and again on
every operation. Instead we should convert this stuff to a C values once,
when they're set in the first place, and stash those C values directly into
a thread-local variable. See PyThread_*_key in pythread.h for a raw TLS
implementation that's always available (and which is what
PyThreadState_GetDict() is built on top of). The documentation is in the
Python source distribution in comments in Python/thread.c.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130716/47bcb533/attachment.html>
-------------- next part --------------
ROUTINE ====================== PyArray_NewFromDescr
   168    505 samples (flat, cumulative) 17.4% of total
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     3      5   838: {
     1      1       4daf0: push   %r15
     .      .       4daf2: mov    %edx,%r11d
     .      .       4daf5: mov    %rsi,%r15
     .      .       4daf8: push   %r14
     .      .       4dafa: push   %r13
     .      .       4dafc: push   %r12
     .      .       4dafe: push   %rbp
     .      1       4daff: mov    %rcx,%rbp
     1      2       4db02: push   %rbx
     1      1       4db03: mov    %r8,%rbx
     .      .       4db06: sub    $0x248,%rsp
     .      .   845: if (descr->subarray) {
     .      .       4db0d: mov    0x28(%rsi),%r13
     .      .   838: {
     .      .       4db11: mov    %rdi,0x28(%rsp)
     .      .       4db16: mov    %r9,0x30(%rsp)
     .      .   845: if (descr->subarray) {
     .      .       4db1b: test   %r13,%r13
     .      .       4db1e: je     4dcb0 <PyArray_NewFromDescr+0x1c0>
     .      .   849: memcpy(newdims, dims, nd*sizeof(npy_intp));
     .      .       4db24: movslq %edx,%r12
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
     .      .    52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
     .      .       4db27: lea    0x40(%rsp),%rdi
     .      .       4db2c: mov    $0x200,%ecx
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     .      .   849: memcpy(newdims, dims, nd*sizeof(npy_intp));
     .      .       4db31: shl    $0x3,%r12
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
     .      .    52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
     .      .       4db35: mov    %rbp,%rsi
     .      .       4db38: mov    %r11d,0x10(%rsp)
     .      .       4db3d: mov    %r12,%rdx
     .      .       4db40: callq  1a1a0 <__memcpy_chk at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     .      .   850: if (strides) {
     .      .       4db45: test   %rbx,%rbx
     .      .   848: npy_intp *newstrides = NULL;
     .      .       4db48: movq   $0x0,0x20(%rsp)
     .      .   850: if (strides) {
     .      .       4db51: mov    0x10(%rsp),%r11d
     .      .       4db56: je     4db7d <PyArray_NewFromDescr+0x8d>
     .      .   851: newstrides = newdims + NPY_MAXDIMS;
     .      .       4db58: lea    0x140(%rsp),%rbp
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
     .      .    52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
     .      .       4db60: mov    $0x100,%ecx
     .      .       4db65: mov    %r12,%rdx
     .      .       4db68: mov    %rbx,%rsi
     .      .       4db6b: mov    %rbp,%rdi
     .      .       4db6e: callq  1a1a0 <__memcpy_chk at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     .      .   851: newstrides = newdims + NPY_MAXDIMS;
     .      .       4db73: mov    0x10(%rsp),%r11d
     .      .       4db78: mov    %rbp,0x20(%rsp)
     .      .   228: tuple = PyTuple_Check(old->subarray->shape);
     .      .       4db7d: mov    0x8(%r13),%rdi
     .      .   227: mydim = newdims + oldnd;
     .      .       4db81: lea    0x40(%rsp),%r14
     .      .   224: *des = old->subarray->base;
     .      .       4db86: mov    0x0(%r13),%rbp
     .      .   227: mydim = newdims + oldnd;
     .      .       4db8a: add    %r12,%r14
     .      .   228: tuple = PyTuple_Check(old->subarray->shape);
     .      .       4db8d: mov    0x8(%rdi),%rax
     .      .   229: if (tuple) {
     .      .       4db91: testb  $0x4,0xab(%rax)
     .      .       4db98: jne    4dc60 <PyArray_NewFromDescr+0x170>
     .      .   237: newnd = oldnd + numnew;
     .      .       4db9e: add    $0x1,%r11d
     .      .   238: if (newnd > NPY_MAXDIMS) {
     .      .       4dba2: cmp    $0x20,%r11d
     .      .   237: newnd = oldnd + numnew;
     .      .       4dba6: mov    %r11d,0x3c(%rsp)
     .      .   238: if (newnd > NPY_MAXDIMS) {
     .      .       4dbab: jg     4dbf6 <PyArray_NewFromDescr+0x106>
     .      .   248: mydim[0] = (npy_intp) PyInt_AsLong(old->subarray->shape);
     .      .       4dbad: callq  19970 <PyInt_AsLong at plt>
     .      .   233: numnew = 1;
     .      .       4dbb2: mov    $0x1,%ebx
     .      .   248: mydim[0] = (npy_intp) PyInt_AsLong(old->subarray->shape);
     .      .       4dbb7: mov    %rax,(%r14)
     .      .   251: if (newstrides) {
     .      .       4dbba: cmpq   $0x0,0x20(%rsp)
     .      .       4dbc0: je     4dbf6 <PyArray_NewFromDescr+0x106>
     .      .   255: mystrides = newstrides + oldnd;
     .      .       4dbc2: add    0x20(%rsp),%r12
     .      .   258: for (i = numnew - 1; i >= 0; i--) {
     .      .       4dbc7: sub    $0x1,%ebx
     .      .   257: tempsize = (*des)->elsize;
     .      .       4dbca: movslq 0x20(%rbp),%rdx
     .      .   258: for (i = numnew - 1; i >= 0; i--) {
     .      .       4dbce: js     4dbf6 <PyArray_NewFromDescr+0x106>
     .      .   260: tempsize *= mydim[i] ? mydim[i] : 1;
     .      .       4dbd0: mov    $0x1,%esi
     .      .       4dbd5: nopl   (%rax)
     .      .   259: mystrides[i] = tempsize;
     .      .       4dbd8: movslq %ebx,%rax
     .      .       4dbdb: mov    %rdx,(%r12,%rax,8)
     .      .   260: tempsize *= mydim[i] ? mydim[i] : 1;
     .      .       4dbdf: mov    (%r14,%rax,8),%rax
     .      .       4dbe3: test   %rax,%rax
     .      .       4dbe6: cmove  %rsi,%rax
     .      .   258: for (i = numnew - 1; i >= 0; i--) {
     .      .       4dbea: sub    $0x1,%ebx
     .      .   260: tempsize *= mydim[i] ? mydim[i] : 1;
     .      .       4dbed: imul   %rax,%rdx
     .      .   258: for (i = numnew - 1; i >= 0; i--) {
     .      .       4dbf1: cmp    $0xffffffff,%ebx
     .      .       4dbf4: jne    4dbd8 <PyArray_NewFromDescr+0xe8>
     .      .   265: Py_INCREF(*des);
     .      .       4dbf6: addq   $0x1,0x0(%rbp)
     .      .   266: Py_DECREF(old);
     .      .       4dbfb: subq   $0x1,(%r15)
     .      .       4dbff: jne    4dc0b <PyArray_NewFromDescr+0x11b>
     .      .       4dc01: mov    0x8(%r15),%rax
     .      .       4dc05: mov    %r15,%rdi
     .      .       4dc08: callq  *0x30(%rax)
     .      .   856: ret = PyArray_NewFromDescr(subtype, descr, nd, newdims,
     .      .       4dc0b: mov    0x280(%rsp),%edx
     .      .       4dc12: mov    0x288(%rsp),%rax
     .      .       4dc1a: lea    0x40(%rsp),%rcx
     .      .       4dc1f: mov    0x30(%rsp),%r9
     .      .       4dc24: mov    0x20(%rsp),%r8
     .      .       4dc29: mov    %rbp,%rsi
     .      .       4dc2c: mov    0x28(%rsp),%rdi
     .      .       4dc31: mov    %edx,(%rsp)
     .      .       4dc34: mov    0x3c(%rsp),%edx
     .      .       4dc38: mov    %rax,0x8(%rsp)
     .      .       4dc3d: callq  4daf0 <PyArray_NewFromDescr>
     .      .       4dc42: mov    %rax,%rbx
     5     10  1064: }
     .      .       4dc45: add    $0x248,%rsp
     .      .       4dc4c: mov    %rbx,%rax
     .      .       4dc4f: pop    %rbx
     .      1       4dc50: pop    %rbp
     1      2       4dc51: pop    %r12
     1      1       4dc53: pop    %r13
     .      2       4dc55: pop    %r14
     2      3       4dc57: pop    %r15
     1      1       4dc59: retq   
     .      .       4dc5a: nopw   0x0(%rax,%rax,1)
     .      .   230: numnew = PyTuple_GET_SIZE(old->subarray->shape);
     .      .       4dc60: mov    0x10(%rdi),%rax
     .      .   237: newnd = oldnd + numnew;
     .      .       4dc64: add    %eax,%r11d
     .      .   230: numnew = PyTuple_GET_SIZE(old->subarray->shape);
     .      .       4dc67: mov    %eax,%ebx
     .      .   238: if (newnd > NPY_MAXDIMS) {
     .      .       4dc69: cmp    $0x20,%r11d
     .      .   237: newnd = oldnd + numnew;
     .      .       4dc6d: mov    %r11d,0x3c(%rsp)
     .      .   238: if (newnd > NPY_MAXDIMS) {
     .      .       4dc72: jg     4dbf6 <PyArray_NewFromDescr+0x106>
     .      .   242: for (i = 0; i < numnew; i++) {
     .      .       4dc74: test   %eax,%eax
     .      .       4dc76: jle    4dbba <PyArray_NewFromDescr+0xca>
     .      .       4dc7c: xor    %r13d,%r13d
     .      .       4dc7f: jmp    4dc90 <PyArray_NewFromDescr+0x1a0>
     .      .       4dc81: nopl   0x0(%rax)
     .      .       4dc88: mov    0x28(%r15),%rax
     .      .       4dc8c: mov    0x8(%rax),%rdi
     .      .   243: mydim[i] = (npy_intp) PyInt_AsLong(
     .      .       4dc90: movslq %r13d,%rax
     .      .       4dc93: mov    0x18(%rdi,%rax,8),%rdi
     .      .       4dc98: callq  19970 <PyInt_AsLong at plt>
     .      .       4dc9d: mov    %rax,(%r14,%r13,8)
     .      .       4dca1: add    $0x1,%r13
     .      .   242: for (i = 0; i < numnew; i++) {
     .      .       4dca5: cmp    %r13d,%ebx
     .      .       4dca8: jg     4dc88 <PyArray_NewFromDescr+0x198>
     .      .       4dcaa: jmpq   4dbba <PyArray_NewFromDescr+0xca>
     .      .       4dcaf:    nop
     .      .   862: if ((unsigned int)nd > (unsigned int)NPY_MAXDIMS) {
     .      .       4dcb0: cmp    $0x20,%edx
     .      .       4dcb3: ja     4def0 <PyArray_NewFromDescr+0x400>
     .      1   872: sd = (size_t) descr->elsize;
     .      1       4dcb9: movslq 0x20(%rsi),%r12
     2     43   873: if (sd == 0) {
     1      1       4dcbd: test   %r12,%r12
     .      1       4dcc0: je     4dea0 <PyArray_NewFromDescr+0x3b0>
     1      1       4dcc6: movabs $0x7fffffffffffffff,%rax
     .      .       4dcd0: xor    %edx,%edx
     .     40       4dcd2: div    %r12
    41     42   892: for (i = 0; i < nd; i++) {
    40     40       4dcd5: test   %r11d,%r11d
     .      .       4dcd8: je     4e16c <PyArray_NewFromDescr+0x67c>
     .      .       4dcde: xor    %r9d,%r9d
     .      1       4dce1: mov    $0x1,%r14d
     1      1       4dce7: nopw   0x0(%rax,%rax,1)
     .      .   893: npy_intp dim = dims[i];
     .      .       4dcf0: mov    0x0(%rbp,%r9,8),%rsi
     .      .   895: if (dim == 0) {
     .      .       4dcf5: cmp    $0x0,%rsi
     .      .       4dcf9: je     4dd18 <PyArray_NewFromDescr+0x228>
     .      .   903: if (dim < 0) {
     .      .       4dcfb: jl     4dfc0 <PyArray_NewFromDescr+0x4d0>
     .      .   917: if (dim > largest) {
     .      .       4dd01: cmp    %rax,%rsi
     .      .       4dd04: jg     4dfd8 <PyArray_NewFromDescr+0x4e8>
     .     47   924: largest /= dim;
     .      .       4dd0a: mov    %rax,%rdx
     .      .       4dd0d: sar    $0x3f,%rdx
     .     47       4dd11: idiv   %rsi
    48     49   923: size *= dim;
    47     48       4dd14: imul   %rsi,%r14
     1      1       4dd18: add    $0x1,%r9
     .      .   892: for (i = 0; i < nd; i++) {
     .      .       4dd1c: cmp    %r9d,%r11d
     .      .       4dd1f: jg     4dcf0 <PyArray_NewFromDescr+0x200>
     1     51   927: fa = (PyArrayObject_fields *) subtype->tp_alloc(subtype, 0);
     .      .       4dd21: mov    0x28(%rsp),%rdi
     .      1       4dd26: xor    %esi,%esi
     1      1       4dd28: mov    %r11d,0x10(%rsp)
     .     49       4dd2d: callq  *0x130(%rdi)
     2      3   928: if (fa == NULL) {
     2      3       4dd33: test   %rax,%rax
     1      1   927: fa = (PyArrayObject_fields *) subtype->tp_alloc(subtype, 0);
     1      1       4dd36: mov    %rax,%r13
     4      8   928: if (fa == NULL) {
     .      4       4dd39: mov    0x10(%rsp),%r11d
     4      4       4dd3e: je     4dec5 <PyArray_NewFromDescr+0x3d5>
     .      .   935: if (data == NULL) {
     .      .       4dd44: cmpq   $0x0,0x30(%rsp)
     .      .   932: fa->nd = nd;
     .      .       4dd4a: mov    %r11d,0x18(%rax)
     .      .   933: fa->dimensions = NULL;
     .      .       4dd4e: movq   $0x0,0x20(%rax)
     .      .   934: fa->data = NULL;
     .      .       4dd56: movq   $0x0,0x10(%rax)
     .      .   935: if (data == NULL) {
     .      .       4dd5e: je     4e068 <PyArray_NewFromDescr+0x578>
     .      3   946: fa->flags = (flags & ~NPY_ARRAY_UPDATEIFCOPY);
     .      .       4dd64: mov    0x280(%rsp),%eax
     .      .       4dd6b: and    $0xef,%ah
     .      3       4dd6e: mov    %eax,0x40(%r13)
     3      4   952: if (nd > 0) {
     3      4       4dd72: test   %r11d,%r11d
     1      5   948: fa->descr = descr;
     1      5       4dd75: mov    %r15,0x38(%r13)
     4      5   949: fa->base = (PyObject *)NULL;
     4      5       4dd79: movq   $0x0,0x30(%r13)
     1      3   950: fa->weakreflist = (PyObject *)NULL;
     1      3       4dd81: movq   $0x0,0x48(%r13)
     2      2   952: if (nd > 0) {
     2      2       4dd89: jne    4df40 <PyArray_NewFromDescr+0x450>
     .      .   975: fa->flags |= NPY_ARRAY_F_CONTIGUOUS;
     .      .       4dd8f: orl    $0x2,0x40(%r13)
     .      1   974: fa->dimensions = fa->strides = NULL;
     .      1       4dd94: movq   $0x0,0x28(%r13)
     2      3   978: if (data == NULL) {
     1      2       4dd9c: cmpq   $0x0,0x30(%rsp)
     1      1       4dda2: je     4e083 <PyArray_NewFromDescr+0x593>
     .      1  1008: fa->flags &= ~NPY_ARRAY_OWNDATA;
     .      1       4dda8: andl   $0xfffffffb,0x40(%r13)
     1      1  1010: fa->data = data;
     1      1       4ddad: mov    0x30(%rsp),%rax
     .      .  1016: if (strides != NULL) {
     .      .       4ddb2: test   %rbx,%rbx
     .      .  1010: fa->data = data;
     .      .       4ddb5: mov    %rax,0x10(%r13)
     .      .  1016: if (strides != NULL) {
     .      .       4ddb9: je     4ddc8 <PyArray_NewFromDescr+0x2d8>
     .      .  1017: PyArray_UpdateFlags((PyArrayObject *)fa, NPY_ARRAY_UPDATE_ALL);
     .      .       4ddbb: mov    $0x103,%esi
     .      .       4ddc0: mov    %r13,%rdi
     .      .       4ddc3: callq  7b3a0 <PyArray_UpdateFlags>
     6     12  1025: if ((subtype != &PyArray_Type)) {
     .      .       4ddc8: lea    0x2d7c11(%rip),%rax        # 3259e0 <PyArray_Type>
     .      6       4ddcf: cmp    %rax,0x28(%rsp)
     6      6       4ddd4: mov    %r13,%rbx
     .      .       4ddd7: je     4dc45 <PyArray_NewFromDescr+0x155>
     .      .  1028: func = PyObject_GetAttrString((PyObject *)fa, "__array_finalize__");
     .      .       4dddd: lea    0x9d5d4(%rip),%rsi        # eb3b8 <CSWTCH.53+0x7f8>
     .      .       4dde4: mov    %r13,%rdi
     .      .       4dde7: callq  19ac0 <PyObject_GetAttrString at plt>
     .      .  1029: if (func && func != Py_None) {
     .      .       4ddec: test   %rax,%rax
     .      .  1028: func = PyObject_GetAttrString((PyObject *)fa, "__array_finalize__");
     .      .       4ddef: mov    %rax,%rbp
     .      .  1029: if (func && func != Py_None) {
     .      .       4ddf2: je     4dc45 <PyArray_NewFromDescr+0x155>
     .      .       4ddf8: mov    0x2d7121(%rip),%r12        # 324f20 <_DYNAMIC+0x360>
     .      .       4ddff: cmp    %r12,%rax
     .      .       4de02: je     4e04f <PyArray_NewFromDescr+0x55f>
     .      .  1030: if (NpyCapsule_Check(func)) {
     .      .       4de08: mov    0x2d7109(%rip),%rdx        # 324f18 <_DYNAMIC+0x358>
     .      .       4de0f: cmp    %rdx,0x8(%rax)
     .      .       4de13: je     4e11b <PyArray_NewFromDescr+0x62b>
     .      .  1040: args = PyTuple_New(1);
     .      .       4de19: mov    $0x1,%edi
     .      .       4de1e: callq  1a200 <PyTuple_New at plt>
     .      .  1042: obj=Py_None;
     .      .       4de23: cmpq   $0x0,0x288(%rsp)
     .      .  1040: args = PyTuple_New(1);
     .      .       4de2c: mov    %rax,%rbx
     .      .  1046: res = PyObject_Call(func, args, NULL);
     .      .       4de2f: mov    %rbp,%rdi
     .      .  1042: obj=Py_None;
     .      .       4de32: cmovne 0x288(%rsp),%r12
     .      .  1046: res = PyObject_Call(func, args, NULL);
     .      .       4de3b: mov    %rbx,%rsi
     .      .       4de3e: xor    %edx,%edx
     .      .  1044: Py_INCREF(obj);
     .      .       4de40: addq   $0x1,(%r12)
     .      .  1045: PyTuple_SET_ITEM(args, 0, obj);
     .      .       4de45: mov    %r12,0x18(%rbx)
     .      .  1042: obj=Py_None;
     .      .       4de49: mov    %r12,0x288(%rsp)
     .      .  1046: res = PyObject_Call(func, args, NULL);
     .      .       4de51: callq  1a830 <PyObject_Call at plt>
     .      .  1047: Py_DECREF(args);
     .      .       4de56: subq   $0x1,(%rbx)
     .      .  1046: res = PyObject_Call(func, args, NULL);
     .      .       4de5a: mov    %rax,%r12
     .      .  1047: Py_DECREF(args);
     .      .       4de5d: je     4e0ee <PyArray_NewFromDescr+0x5fe>
     .      .  1048: Py_DECREF(func);
     .      .       4de63: subq   $0x1,0x0(%rbp)
     .      .       4de68: je     4e0df <PyArray_NewFromDescr+0x5ef>
     .      .  1049: if (res == NULL) {
     .      .       4de6e: test   %r12,%r12
     .      .       4de71: je     4e033 <PyArray_NewFromDescr+0x543>
     .      .  1053: Py_DECREF(res);
     .      .       4de77: mov    (%r12),%rax
     .      .       4de7b: mov    %r13,%rbx
     .      .       4de7e: sub    $0x1,%rax
     .      .       4de82: test   %rax,%rax
     .      .       4de85: mov    %rax,(%r12)
     .      .       4de89: jne    4dc45 <PyArray_NewFromDescr+0x155>
     .      .       4de8f: mov    0x8(%r12),%rax
     .      .       4de94: mov    %r12,%rdi
     .      .       4de97: callq  *0x30(%rax)
     .      .       4de9a: jmpq   4dc45 <PyArray_NewFromDescr+0x155>
     .      .       4de9f:    nop
     .      .   874: if (!PyDataType_ISSTRING(descr)) {
     .      .       4dea0: mov    0x1c(%rsi),%eax
     .      .       4dea3: sub    $0x12,%eax
     .      .       4dea6: cmp    $0x1,%eax
     .      .       4dea9: jbe    4dfe1 <PyArray_NewFromDescr+0x4f1>
     .      .   875: PyErr_SetString(PyExc_TypeError, "Empty data-type");
     .      .       4deaf: mov    0x2d6fc2(%rip),%rax        # 324e78 <_DYNAMIC+0x2b8>
     .      .       4deb6: lea    0x9d4d9(%rip),%rsi        # eb396 <CSWTCH.53+0x7d6>
     .      .   904: PyErr_SetString(PyExc_ValueError,
     .      .       4debd: mov    (%rax),%rdi
     .      .       4dec0: callq  19d10 <PyErr_SetString at plt>
     .      .   929: Py_DECREF(descr);
     .      .       4dec5: subq   $0x1,(%r15)
     .      .       4dec9: je     4ded8 <PyArray_NewFromDescr+0x3e8>
     .      .       4decb: xor    %ebx,%ebx
     .      .       4decd: jmpq   4dc45 <PyArray_NewFromDescr+0x155>
     .      .       4ded2: nopw   0x0(%rax,%rax,1)
     .      .       4ded8: mov    0x8(%r15),%rax
     .      .       4dedc: mov    %r15,%rdi
     .      .       4dedf: xor    %ebx,%ebx
     .      .       4dee1: callq  *0x30(%rax)
     .      .       4dee4: jmpq   4dc45 <PyArray_NewFromDescr+0x155>
     .      .       4dee9: nopl   0x0(%rax)
     .      .   863: PyErr_Format(PyExc_ValueError,
     .      .       4def0: mov    0x2d6f69(%rip),%rax        # 324e60 <_DYNAMIC+0x2a0>
     .      .       4def7: lea    0x9d72a(%rip),%rsi        # eb628 <CSWTCH.53+0xa68>
     .      .       4defe: mov    $0x20,%edx
     .      .       4df03: mov    (%rax),%rdi
     .      .       4df06: xor    %eax,%eax
     .      .       4df08: callq  1a8d0 <PyErr_Format at plt>
     .      .       4df0d: jmp    4dec5 <PyArray_NewFromDescr+0x3d5>
     .      .   939: if (nd > 1) {
     .      .       4df0f: cmp    $0x1,%r11d
     .      .       4df13: jle    4e177 <PyArray_NewFromDescr+0x687>
     .      .   940: fa->flags &= ~NPY_ARRAY_C_CONTIGUOUS;
     .      .       4df19: movl   $0x502,0x40(%rax)
     .      .   948: fa->descr = descr;
     .      .       4df20: mov    %r15,0x38(%rax)
     .      .   949: fa->base = (PyObject *)NULL;
     .      .       4df24: movq   $0x0,0x30(%rax)
     .      .   950: fa->weakreflist = (PyObject *)NULL;
     .      .       4df2c: movq   $0x0,0x48(%rax)
     .      .   942: flags = NPY_ARRAY_F_CONTIGUOUS;
     .      .       4df34: movl   $0x2,0x280(%rsp)
     .      .       4df3f:    nop
     9     60   953: fa->dimensions = PyDimMem_NEW(3*nd);
     .      4       4df40: lea    (%r11,%r11,2),%edi
     4      9       4df44: mov    %r11d,0x10(%rsp)
     5      5       4df49: movslq %edi,%rdi
     .      .       4df4c: shl    $0x3,%rdi
     .     42       4df50: callq  1aa50 <PyMem_Malloc at plt>
     .      .   954: if (fa->dimensions == NULL) {
     .      .       4df55: test   %rax,%rax
     .      2   953: fa->dimensions = PyDimMem_NEW(3*nd);
     .      2       4df58: mov    %rax,0x20(%r13)
     2      2   954: if (fa->dimensions == NULL) {
     2      2       4df5c: mov    0x10(%rsp),%r11d
     .      .       4df61: je     4e02e <PyArray_NewFromDescr+0x53e>
     .      .   958: fa->strides = fa->dimensions + nd;
     .      .       4df67: movslq %r11d,%r8
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
     .      .    52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
     .      .       4df6a: mov    %rax,%rdi
     .      .       4df6d: mov    %rbp,%rsi
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     1      2   958: fa->strides = fa->dimensions + nd;
     .      .       4df70: shl    $0x3,%r8
     .      1       4df74: lea    (%rax,%r8,1),%rdx
     1      1       4df78: mov    %rdx,0x28(%r13)
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
     .     15    52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
     .      .       4df7c: mov    %r8,%rdx
     .      .       4df7f: mov    %r8,0x18(%rsp)
     .      .       4df84: mov    %r11d,0x10(%rsp)
     .     15       4df89: callq  1a240 <memcpy at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     1      1   960: if (strides == NULL) { /* fill it in */
     1      1       4df8e: test   %rbx,%rbx
     .      1   961: sd = _array_fill_strides(fa->strides, dims, nd, sd,
     .      1       4df91: mov    0x28(%r13),%rdi
     3      5   960: if (strides == NULL) { /* fill it in */
     1      1       4df95: mov    0x18(%rsp),%r8
     .      2       4df9a: mov    0x10(%rsp),%r11d
     2      2       4df9f: je     4e14a <PyArray_NewFromDescr+0x65a>
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
     .      .    52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
     .      .       4dfa5: mov    %r8,%rdx
     .      .       4dfa8: mov    %rbx,%rsi
     .      .       4dfab: callq  1a240 <memcpy at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     .      .   970: sd *= size;
     .      .       4dfb0: imul   %r14,%r12
     .      .       4dfb4: jmpq   4dd9c <PyArray_NewFromDescr+0x2ac>
     .      .       4dfb9: nopl   0x0(%rax)
     .      .   904: PyErr_SetString(PyExc_ValueError,
     .      .       4dfc0: lea    0x9d691(%rip),%rsi        # eb658 <CSWTCH.53+0xa98>
     .      .       4dfc7: mov    0x2d6e92(%rip),%rax        # 324e60 <_DYNAMIC+0x2a0>
     .      .       4dfce: jmpq   4debd <PyArray_NewFromDescr+0x3cd>
     .      .       4dfd3: nopl   0x0(%rax,%rax,1)
     .      .   918: PyErr_SetString(PyExc_ValueError,
     .      .       4dfd8: lea    0x9d3c7(%rip),%rsi        # eb3a6 <CSWTCH.53+0x7e6>
     .      .       4dfdf: jmp    4dfc7 <PyArray_NewFromDescr+0x4d7>
     .      .   879: PyArray_DESCR_REPLACE(descr);
     .      .       4dfe1: mov    %rsi,%rdi
     .      .       4dfe4: mov    %edx,0x10(%rsp)
     .      .       4dfe8: callq  5e680 <PyArray_DescrNew>
     .      .       4dfed: subq   $0x1,(%r15)
     .      .       4dff1: mov    0x10(%rsp),%r11d
     .      .       4dff6: je     4e0fd <PyArray_NewFromDescr+0x60d>
     .      .       4dffc: test   %rax,%rax
     .      .       4dfff: je     4decb <PyArray_NewFromDescr+0x3db>
     .      .   883: if (descr->type_num == NPY_STRING) {
     .      .       4e005: cmpl   $0x12,0x1c(%rax)
     .      .       4e009: je     4e0c0 <PyArray_NewFromDescr+0x5d0>
     .      .   887: sd = descr->elsize = sizeof(npy_ucs4);
     .      .       4e00f: movl   $0x4,0x20(%rax)
     .      .       4e016: mov    %rax,%r15
     .      .       4e019: mov    $0x4,%r12d
     .      .       4e01f: movabs $0x1fffffffffffffff,%rax
     .      .       4e029: jmpq   4dcd5 <PyArray_NewFromDescr+0x1e5>
     .      .   955: PyErr_NoMemory();
     .      .       4e02e: callq  19bb0 <PyErr_NoMemory at plt>
     .      3  1062: Py_DECREF(fa);
     .      .       4e033: subq   $0x1,0x0(%r13)
     .      .       4e038: jne    4decb <PyArray_NewFromDescr+0x3db>
     .      .       4e03e: mov    0x8(%r13),%rax
     .      .       4e042: mov    %r13,%rdi
     .      .       4e045: xor    %ebx,%ebx
     .      .       4e047: callq  *0x30(%rax)
     .      .       4e04a: jmpq   4dc45 <PyArray_NewFromDescr+0x155>
     .      .       4e04f: subq   $0x1,(%rax)
     .      .       4e053: jne    4dc45 <PyArray_NewFromDescr+0x155>
     .      .       4e059: mov    0x8(%rax),%rax
     .      .       4e05d: mov    %rbp,%rdi
     .      .       4e060: callq  *0x30(%rax)
     .      3       4e063: jmpq   4dc45 <PyArray_NewFromDescr+0x155>
    11     20   937: if (flags) {
     3     11       4e068: mov    0x280(%rsp),%edi
     8      8       4e06f: test   %edi,%edi
     .      1       4e071: jne    4df0f <PyArray_NewFromDescr+0x41f>
     4      7   936: fa->flags = NPY_ARRAY_DEFAULT;
     1      4       4e077: movl   $0x501,0x40(%rax)
     3      3       4e07e: jmpq   4dd72 <PyArray_NewFromDescr+0x282>
     .      .   985: if (sd == 0) {
     .      .       4e083: test   %r12,%r12
     .      .       4e086: jne    4e08c <PyArray_NewFromDescr+0x59c>
     .      2   986: sd = descr->elsize;
     .      2       4e088: movslq 0x20(%r15),%r12
     2     53   988: data = PyDataMem_NEW(sd);
     2      2       4e08c: mov    %r12,%rdi
     .     51       4e08f: callq  af500 <PyDataMem_NEW>
     2      2   989: if (data == NULL) {
     2      2       4e094: test   %rax,%rax
     .      1   988: data = PyDataMem_NEW(sd);
     .      1       4e097: mov    %rax,0x30(%rsp)
     1      1   989: if (data == NULL) {
     1      1       4e09c: je     4e02e <PyArray_NewFromDescr+0x53e>
     .      .   993: fa->flags |= NPY_ARRAY_OWNDATA;
     .      .       4e09e: orl    $0x4,0x40(%r13)
     1      2   999: if (PyDataType_FLAGCHK(descr, NPY_NEEDS_INIT)) {
     .      1       4e0a3: testb  $0x8,0x1b(%r15)
     1      1       4e0a8: je     4ddad <PyArray_NewFromDescr+0x2bd>
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
     .      .    85: return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest));
     .      .       4e0ae: mov    %r12,%rdx
     .      .       4e0b1: xor    %esi,%esi
     .      .       4e0b3: mov    %rax,%rdi
     .      .       4e0b6: callq  19e50 <memset at plt>
     .      .       4e0bb: jmpq   4ddad <PyArray_NewFromDescr+0x2bd>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     .      .   884: sd = descr->elsize = 1;
     .      .       4e0c0: movl   $0x1,0x20(%rax)
     .      .       4e0c7: mov    %rax,%r15
     .      .       4e0ca: mov    $0x1,%r12d
     .      .       4e0d0: movabs $0x7fffffffffffffff,%rax
     .      .       4e0da: jmpq   4dcd5 <PyArray_NewFromDescr+0x1e5>
     .      .       4e0df: mov    0x8(%rbp),%rax
     .      .       4e0e3: mov    %rbp,%rdi
     .      .       4e0e6: callq  *0x30(%rax)
     .      .       4e0e9: jmpq   4de6e <PyArray_NewFromDescr+0x37e>
     .      .       4e0ee: mov    0x8(%rbx),%rax
     .      .       4e0f2: mov    %rbx,%rdi
     .      .       4e0f5: callq  *0x30(%rax)
     .      .       4e0f8: jmpq   4de63 <PyArray_NewFromDescr+0x373>
     .      .       4e0fd: mov    0x8(%r15),%rdx
     .      .       4e101: mov    %r15,%rdi
     .      .       4e104: mov    %rax,0x18(%rsp)
     .      .       4e109: callq  *0x30(%rdx)
     .      .       4e10c: mov    0x10(%rsp),%r11d
     .      .       4e111: mov    0x18(%rsp),%rax
     .      .       4e116: jmpq   4dffc <PyArray_NewFromDescr+0x50c>
-------------------- ...ip-UN1TwQ-build/numpy/core/include/numpy/npy_3kcompat.h
     .      .   377: return PyCObject_AsVoidPtr(ptr);
     .      .       4e11b: mov    %rax,%rdi
     .      .       4e11e: callq  1a6a0 <PyCObject_AsVoidPtr at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     .      .  1034: Py_DECREF(func);
     .      .       4e123: subq   $0x1,0x0(%rbp)
-------------------- ...ip-UN1TwQ-build/numpy/core/include/numpy/npy_3kcompat.h
     .      .   377: return PyCObject_AsVoidPtr(ptr);
     .      .       4e128: mov    %rax,%rbx
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
     .      .  1034: Py_DECREF(func);
     .      .       4e12b: je     4e18e <PyArray_NewFromDescr+0x69e>
     .      1  1035: if (cfunc((PyArrayObject *)fa, obj) < 0) {
     .      .       4e12d: mov    0x288(%rsp),%rsi
     .      .       4e135: mov    %r13,%rdi
     .      .       4e138: callq  *%rbx
     .      .       4e13a: test   %eax,%eax
     .      .       4e13c: mov    %r13,%rbx
     .      .       4e13f: jns    4dc45 <PyArray_NewFromDescr+0x155>
     .      1       4e145: jmpq   4e033 <PyArray_NewFromDescr+0x543>
     4     25   961: sd = _array_fill_strides(fa->strides, dims, nd, sd,
     1      2       4e14a: mov    0x280(%rsp),%r8d
     1      1       4e152: lea    0x40(%r13),%r9
     .      .       4e156: mov    %r12,%rcx
     .      1       4e159: mov    %r11d,%edx
     1      1       4e15c: mov    %rbp,%rsi
     .     19       4e15f: callq  4da20 <_array_fill_strides>
     1      1       4e164: mov    %rax,%r12
     .      .       4e167: jmpq   4dd9c <PyArray_NewFromDescr+0x2ac>
     .      .   871: size = 1;
     .      .       4e16c: mov    $0x1,%r14d
     .      .       4e172: jmpq   4dd21 <PyArray_NewFromDescr+0x231>
     .      .   938: fa->flags |= NPY_ARRAY_F_CONTIGUOUS;
     .      .       4e177: movl   $0x503,0x40(%rax)
     .      .   942: flags = NPY_ARRAY_F_CONTIGUOUS;
     .      .       4e17e: movl   $0x2,0x280(%rsp)
     .      .       4e189: jmpq   4dd72 <PyArray_NewFromDescr+0x282>
     .      .       4e18e: mov    0x8(%rbp),%rax
     .      .       4e192: mov    %rbp,%rdi
     .      .       4e195: callq  *0x30(%rax)
     .      .       4e198: jmp    4e12d <PyArray_NewFromDescr+0x63d>
     .      .       4e19a: nopw   0x0(%rax,%rax,1)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: master-array-float64-add.pdf
Type: application/pdf
Size: 19235 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130716/47bcb533/attachment.pdf>


More information about the NumPy-Discussion mailing list