[Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array
Nathaniel Smith
njs at pobox.com
Tue Jul 16 11:55:58 EDT 2013
On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma <arinkverma at gmail.com> wrote:
> >Each ndarray does two mallocs, for the obj and buffer. These could be
> combined into 1 - just allocate the total size and do some pointer
> >arithmetic, then set OWNDATA to false.
> So, that two mallocs has been mentioned in project introduction. I got
> that wrong.
>
On further thought/reading the code, it appears to be more complicated than
that, actually.
It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1
for the array object itself, and one for the shapes + strides. And, one
call to regular-old malloc: for the data buffer.
(Mysteriously, shapes + strides together have 2*ndim elements, but to hold
them we allocate a memory region sized to hold 3*ndim elements. I'm not
sure why.)
And contrary to what I said earlier, this is about as optimized as it can
be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc,
because the shapes+strides may need to be resized without affecting the
much larger data area. But it's tempting to allocate the array object and
the data buffer in a single memory region, like I suggested earlier. And
this would ALMOST work. But, it turns out there is code out there which
assumes (whether wisely or not) that you can swap around which data buffer
a given PyArrayObject refers to (hi Theano!). And supporting this means
that data buffers and PyArrayObjects need to be in separate memory regions.
>magnitude more time in inefficient loop selection and unnecessary writes
> to the FP control word?
> loop selection, contribute around 2~3% in time. I implemented cache with PyThreadState_GetDict()
> but it didnt help.
> Even generating prepopulated dict/list in code_generator/generate_umath.py is
> not helping,
>
>
> Here, it the distribution of time, on addition operations. All memory
> related and BuildValue operations cost more than 7%, rest looping ones are
> around 2-3%:
>
> - PyUFunc_AddititonTypeResolver(7.6%)
> - *SimpleBinaryOperationTypeResolver(6.2%)*
>
>
> - *execute_legacy_ufunc_loop(20.7%)*
> - trivial_three_operand_loop(8.6%) ,this will be around 3.4% when pr #
> 3521 <https://github.com/numpy/numpy/pull/3521> get merged
> - *PYArray_NewFromDescr(7.3%)*
> - PyUFunc_DefaultLegacyInnerLoopSelector(2.5%)
>
>
> - PyUFunc_GetPyValues(12.0%)
> - *_extract_pyvals(9.2%)*
> - *PyArray_Return(14.3%)*
>
> Hmm, you prodded me into running those numbers again to see :-)
At http://www.arinkverma.in/2013/06/finding-bottleneck-in-pythonnumpy.htmlyou
say that you're using a Python compiled with --with-pydebug. Is this
true? If so then stop! You want numpy compiled with generic debugging
information ("-g" on gcc), and maybe it helps to have Python compiled with
"-g" as well. But --with-pydebug goes much further -- it actually changes
the Python interpreter in many ways to add lots of expensive self-checks.
On my machine simple operations like "[]" (allocate a list) or "1.0 + 1.0"
go about 4x slower when I use Ubuntu's python-dbg package (which is
compiled with --with-pydebug). You can't trust speed measurements you get
from a --with-pydebug build.
Anyway, I'm using 64-bit python2.7 from Ubuntu's repo, self-compiled numpy
master, with this measurement code:
import ctypes
profiler = ctypes.CDLL("libprofiler.so.0")
def loop(n):
import numpy as np
print "Numpy:", np.__version__
x = np.asarray([1.0, 2.0])
for i in xrange(n):
x + x
profiler.ProfilerStart("/tmp/master-array-float64-add.prof")
loop(10000000)
profiler.ProfilerStop()
Graph attached.
Notice:
- because my benchmark has a 2-element array instead of a scalar array, the
special-case scalar return logic (PyArray_Return etc.) disappears. This
makes all percentages a bit higher in my graph, because the operation is
overall faster.
- PyArray_NewFromDescr does indeed take 11.6% of the time, but it's not
clear why. Half that time is directly inside PyArray_NewFromDescr, not in
any sub-calls to malloc-related functions. Also, you see a lot more time in
array_alloc than I do, which may be caused by --with-pydebug.
Taking a closer look with google-pprof --disasm=PyArray_NewFromDescr (also
attached), it looks like the major cost here is, bizarrely enough, the
calculation of the array size?! Out of 338 cumulative samples in this
function, I count 175 that are associated with various div/mul
instructions, while all the mallocs together take only 164 (= 5.6% of total
time).
This is pretty bizarre for a bunch of 1-dimensional 2-element arrays!?
- PyUFunc_AdditionTypeResolver takes 10.9% of the time, and
PyUFunc_DefaultLegacyInnerLoopSelector takes another 4.2% of the time, and
this pretty absurd considering that we're talking about locating the
float64 + float64 loop, which should not require any complicated logic.
This should be like 0.1% or something. I'm not surprised that
PyThreadState_GetDict() doesn't help -- doing dict lookups was probably was
more expensive than the thing you replaced! But some sort of simple table
lookup scheme that reduces loop lookup to chasing a few pointers should be
totally doable.
- We're spending 13.6% of the time in PyUFunc_getfperr. I'm pretty sure
that a lot of this is totally wasted time, because we implement both 'set'
and 'clear' operations as 'set+clear', making them twice as costly as
necessary.
(Eventually it would be even better if we could disable this logic entirely
for integer arrays, and for when the user has turned off fp error
reporting. But neither of these would help for this simple float+float
benchmark.)
- _extract_pyvals and PyUFunc_GetPyValues (not sure why they aren't linked
in my graph, but they seem to be the same code) together use >11% of time.
This is also completely silly -- all this time is spent on doing elaborate
stuff to look up entries in a python dict, extract them, and convert them
into, like, some C level bitmasks. And then doing that again and again on
every operation. Instead we should convert this stuff to a C values once,
when they're set in the first place, and stash those C values directly into
a thread-local variable. See PyThread_*_key in pythread.h for a raw TLS
implementation that's always available (and which is what
PyThreadState_GetDict() is built on top of). The documentation is in the
Python source distribution in comments in Python/thread.c.
-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130716/47bcb533/attachment.html>
-------------- next part --------------
ROUTINE ====================== PyArray_NewFromDescr
168 505 samples (flat, cumulative) 17.4% of total
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
3 5 838: {
1 1 4daf0: push %r15
. . 4daf2: mov %edx,%r11d
. . 4daf5: mov %rsi,%r15
. . 4daf8: push %r14
. . 4dafa: push %r13
. . 4dafc: push %r12
. . 4dafe: push %rbp
. 1 4daff: mov %rcx,%rbp
1 2 4db02: push %rbx
1 1 4db03: mov %r8,%rbx
. . 4db06: sub $0x248,%rsp
. . 845: if (descr->subarray) {
. . 4db0d: mov 0x28(%rsi),%r13
. . 838: {
. . 4db11: mov %rdi,0x28(%rsp)
. . 4db16: mov %r9,0x30(%rsp)
. . 845: if (descr->subarray) {
. . 4db1b: test %r13,%r13
. . 4db1e: je 4dcb0 <PyArray_NewFromDescr+0x1c0>
. . 849: memcpy(newdims, dims, nd*sizeof(npy_intp));
. . 4db24: movslq %edx,%r12
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
. . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
. . 4db27: lea 0x40(%rsp),%rdi
. . 4db2c: mov $0x200,%ecx
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
. . 849: memcpy(newdims, dims, nd*sizeof(npy_intp));
. . 4db31: shl $0x3,%r12
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
. . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
. . 4db35: mov %rbp,%rsi
. . 4db38: mov %r11d,0x10(%rsp)
. . 4db3d: mov %r12,%rdx
. . 4db40: callq 1a1a0 <__memcpy_chk at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
. . 850: if (strides) {
. . 4db45: test %rbx,%rbx
. . 848: npy_intp *newstrides = NULL;
. . 4db48: movq $0x0,0x20(%rsp)
. . 850: if (strides) {
. . 4db51: mov 0x10(%rsp),%r11d
. . 4db56: je 4db7d <PyArray_NewFromDescr+0x8d>
. . 851: newstrides = newdims + NPY_MAXDIMS;
. . 4db58: lea 0x140(%rsp),%rbp
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
. . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
. . 4db60: mov $0x100,%ecx
. . 4db65: mov %r12,%rdx
. . 4db68: mov %rbx,%rsi
. . 4db6b: mov %rbp,%rdi
. . 4db6e: callq 1a1a0 <__memcpy_chk at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
. . 851: newstrides = newdims + NPY_MAXDIMS;
. . 4db73: mov 0x10(%rsp),%r11d
. . 4db78: mov %rbp,0x20(%rsp)
. . 228: tuple = PyTuple_Check(old->subarray->shape);
. . 4db7d: mov 0x8(%r13),%rdi
. . 227: mydim = newdims + oldnd;
. . 4db81: lea 0x40(%rsp),%r14
. . 224: *des = old->subarray->base;
. . 4db86: mov 0x0(%r13),%rbp
. . 227: mydim = newdims + oldnd;
. . 4db8a: add %r12,%r14
. . 228: tuple = PyTuple_Check(old->subarray->shape);
. . 4db8d: mov 0x8(%rdi),%rax
. . 229: if (tuple) {
. . 4db91: testb $0x4,0xab(%rax)
. . 4db98: jne 4dc60 <PyArray_NewFromDescr+0x170>
. . 237: newnd = oldnd + numnew;
. . 4db9e: add $0x1,%r11d
. . 238: if (newnd > NPY_MAXDIMS) {
. . 4dba2: cmp $0x20,%r11d
. . 237: newnd = oldnd + numnew;
. . 4dba6: mov %r11d,0x3c(%rsp)
. . 238: if (newnd > NPY_MAXDIMS) {
. . 4dbab: jg 4dbf6 <PyArray_NewFromDescr+0x106>
. . 248: mydim[0] = (npy_intp) PyInt_AsLong(old->subarray->shape);
. . 4dbad: callq 19970 <PyInt_AsLong at plt>
. . 233: numnew = 1;
. . 4dbb2: mov $0x1,%ebx
. . 248: mydim[0] = (npy_intp) PyInt_AsLong(old->subarray->shape);
. . 4dbb7: mov %rax,(%r14)
. . 251: if (newstrides) {
. . 4dbba: cmpq $0x0,0x20(%rsp)
. . 4dbc0: je 4dbf6 <PyArray_NewFromDescr+0x106>
. . 255: mystrides = newstrides + oldnd;
. . 4dbc2: add 0x20(%rsp),%r12
. . 258: for (i = numnew - 1; i >= 0; i--) {
. . 4dbc7: sub $0x1,%ebx
. . 257: tempsize = (*des)->elsize;
. . 4dbca: movslq 0x20(%rbp),%rdx
. . 258: for (i = numnew - 1; i >= 0; i--) {
. . 4dbce: js 4dbf6 <PyArray_NewFromDescr+0x106>
. . 260: tempsize *= mydim[i] ? mydim[i] : 1;
. . 4dbd0: mov $0x1,%esi
. . 4dbd5: nopl (%rax)
. . 259: mystrides[i] = tempsize;
. . 4dbd8: movslq %ebx,%rax
. . 4dbdb: mov %rdx,(%r12,%rax,8)
. . 260: tempsize *= mydim[i] ? mydim[i] : 1;
. . 4dbdf: mov (%r14,%rax,8),%rax
. . 4dbe3: test %rax,%rax
. . 4dbe6: cmove %rsi,%rax
. . 258: for (i = numnew - 1; i >= 0; i--) {
. . 4dbea: sub $0x1,%ebx
. . 260: tempsize *= mydim[i] ? mydim[i] : 1;
. . 4dbed: imul %rax,%rdx
. . 258: for (i = numnew - 1; i >= 0; i--) {
. . 4dbf1: cmp $0xffffffff,%ebx
. . 4dbf4: jne 4dbd8 <PyArray_NewFromDescr+0xe8>
. . 265: Py_INCREF(*des);
. . 4dbf6: addq $0x1,0x0(%rbp)
. . 266: Py_DECREF(old);
. . 4dbfb: subq $0x1,(%r15)
. . 4dbff: jne 4dc0b <PyArray_NewFromDescr+0x11b>
. . 4dc01: mov 0x8(%r15),%rax
. . 4dc05: mov %r15,%rdi
. . 4dc08: callq *0x30(%rax)
. . 856: ret = PyArray_NewFromDescr(subtype, descr, nd, newdims,
. . 4dc0b: mov 0x280(%rsp),%edx
. . 4dc12: mov 0x288(%rsp),%rax
. . 4dc1a: lea 0x40(%rsp),%rcx
. . 4dc1f: mov 0x30(%rsp),%r9
. . 4dc24: mov 0x20(%rsp),%r8
. . 4dc29: mov %rbp,%rsi
. . 4dc2c: mov 0x28(%rsp),%rdi
. . 4dc31: mov %edx,(%rsp)
. . 4dc34: mov 0x3c(%rsp),%edx
. . 4dc38: mov %rax,0x8(%rsp)
. . 4dc3d: callq 4daf0 <PyArray_NewFromDescr>
. . 4dc42: mov %rax,%rbx
5 10 1064: }
. . 4dc45: add $0x248,%rsp
. . 4dc4c: mov %rbx,%rax
. . 4dc4f: pop %rbx
. 1 4dc50: pop %rbp
1 2 4dc51: pop %r12
1 1 4dc53: pop %r13
. 2 4dc55: pop %r14
2 3 4dc57: pop %r15
1 1 4dc59: retq
. . 4dc5a: nopw 0x0(%rax,%rax,1)
. . 230: numnew = PyTuple_GET_SIZE(old->subarray->shape);
. . 4dc60: mov 0x10(%rdi),%rax
. . 237: newnd = oldnd + numnew;
. . 4dc64: add %eax,%r11d
. . 230: numnew = PyTuple_GET_SIZE(old->subarray->shape);
. . 4dc67: mov %eax,%ebx
. . 238: if (newnd > NPY_MAXDIMS) {
. . 4dc69: cmp $0x20,%r11d
. . 237: newnd = oldnd + numnew;
. . 4dc6d: mov %r11d,0x3c(%rsp)
. . 238: if (newnd > NPY_MAXDIMS) {
. . 4dc72: jg 4dbf6 <PyArray_NewFromDescr+0x106>
. . 242: for (i = 0; i < numnew; i++) {
. . 4dc74: test %eax,%eax
. . 4dc76: jle 4dbba <PyArray_NewFromDescr+0xca>
. . 4dc7c: xor %r13d,%r13d
. . 4dc7f: jmp 4dc90 <PyArray_NewFromDescr+0x1a0>
. . 4dc81: nopl 0x0(%rax)
. . 4dc88: mov 0x28(%r15),%rax
. . 4dc8c: mov 0x8(%rax),%rdi
. . 243: mydim[i] = (npy_intp) PyInt_AsLong(
. . 4dc90: movslq %r13d,%rax
. . 4dc93: mov 0x18(%rdi,%rax,8),%rdi
. . 4dc98: callq 19970 <PyInt_AsLong at plt>
. . 4dc9d: mov %rax,(%r14,%r13,8)
. . 4dca1: add $0x1,%r13
. . 242: for (i = 0; i < numnew; i++) {
. . 4dca5: cmp %r13d,%ebx
. . 4dca8: jg 4dc88 <PyArray_NewFromDescr+0x198>
. . 4dcaa: jmpq 4dbba <PyArray_NewFromDescr+0xca>
. . 4dcaf: nop
. . 862: if ((unsigned int)nd > (unsigned int)NPY_MAXDIMS) {
. . 4dcb0: cmp $0x20,%edx
. . 4dcb3: ja 4def0 <PyArray_NewFromDescr+0x400>
. 1 872: sd = (size_t) descr->elsize;
. 1 4dcb9: movslq 0x20(%rsi),%r12
2 43 873: if (sd == 0) {
1 1 4dcbd: test %r12,%r12
. 1 4dcc0: je 4dea0 <PyArray_NewFromDescr+0x3b0>
1 1 4dcc6: movabs $0x7fffffffffffffff,%rax
. . 4dcd0: xor %edx,%edx
. 40 4dcd2: div %r12
41 42 892: for (i = 0; i < nd; i++) {
40 40 4dcd5: test %r11d,%r11d
. . 4dcd8: je 4e16c <PyArray_NewFromDescr+0x67c>
. . 4dcde: xor %r9d,%r9d
. 1 4dce1: mov $0x1,%r14d
1 1 4dce7: nopw 0x0(%rax,%rax,1)
. . 893: npy_intp dim = dims[i];
. . 4dcf0: mov 0x0(%rbp,%r9,8),%rsi
. . 895: if (dim == 0) {
. . 4dcf5: cmp $0x0,%rsi
. . 4dcf9: je 4dd18 <PyArray_NewFromDescr+0x228>
. . 903: if (dim < 0) {
. . 4dcfb: jl 4dfc0 <PyArray_NewFromDescr+0x4d0>
. . 917: if (dim > largest) {
. . 4dd01: cmp %rax,%rsi
. . 4dd04: jg 4dfd8 <PyArray_NewFromDescr+0x4e8>
. 47 924: largest /= dim;
. . 4dd0a: mov %rax,%rdx
. . 4dd0d: sar $0x3f,%rdx
. 47 4dd11: idiv %rsi
48 49 923: size *= dim;
47 48 4dd14: imul %rsi,%r14
1 1 4dd18: add $0x1,%r9
. . 892: for (i = 0; i < nd; i++) {
. . 4dd1c: cmp %r9d,%r11d
. . 4dd1f: jg 4dcf0 <PyArray_NewFromDescr+0x200>
1 51 927: fa = (PyArrayObject_fields *) subtype->tp_alloc(subtype, 0);
. . 4dd21: mov 0x28(%rsp),%rdi
. 1 4dd26: xor %esi,%esi
1 1 4dd28: mov %r11d,0x10(%rsp)
. 49 4dd2d: callq *0x130(%rdi)
2 3 928: if (fa == NULL) {
2 3 4dd33: test %rax,%rax
1 1 927: fa = (PyArrayObject_fields *) subtype->tp_alloc(subtype, 0);
1 1 4dd36: mov %rax,%r13
4 8 928: if (fa == NULL) {
. 4 4dd39: mov 0x10(%rsp),%r11d
4 4 4dd3e: je 4dec5 <PyArray_NewFromDescr+0x3d5>
. . 935: if (data == NULL) {
. . 4dd44: cmpq $0x0,0x30(%rsp)
. . 932: fa->nd = nd;
. . 4dd4a: mov %r11d,0x18(%rax)
. . 933: fa->dimensions = NULL;
. . 4dd4e: movq $0x0,0x20(%rax)
. . 934: fa->data = NULL;
. . 4dd56: movq $0x0,0x10(%rax)
. . 935: if (data == NULL) {
. . 4dd5e: je 4e068 <PyArray_NewFromDescr+0x578>
. 3 946: fa->flags = (flags & ~NPY_ARRAY_UPDATEIFCOPY);
. . 4dd64: mov 0x280(%rsp),%eax
. . 4dd6b: and $0xef,%ah
. 3 4dd6e: mov %eax,0x40(%r13)
3 4 952: if (nd > 0) {
3 4 4dd72: test %r11d,%r11d
1 5 948: fa->descr = descr;
1 5 4dd75: mov %r15,0x38(%r13)
4 5 949: fa->base = (PyObject *)NULL;
4 5 4dd79: movq $0x0,0x30(%r13)
1 3 950: fa->weakreflist = (PyObject *)NULL;
1 3 4dd81: movq $0x0,0x48(%r13)
2 2 952: if (nd > 0) {
2 2 4dd89: jne 4df40 <PyArray_NewFromDescr+0x450>
. . 975: fa->flags |= NPY_ARRAY_F_CONTIGUOUS;
. . 4dd8f: orl $0x2,0x40(%r13)
. 1 974: fa->dimensions = fa->strides = NULL;
. 1 4dd94: movq $0x0,0x28(%r13)
2 3 978: if (data == NULL) {
1 2 4dd9c: cmpq $0x0,0x30(%rsp)
1 1 4dda2: je 4e083 <PyArray_NewFromDescr+0x593>
. 1 1008: fa->flags &= ~NPY_ARRAY_OWNDATA;
. 1 4dda8: andl $0xfffffffb,0x40(%r13)
1 1 1010: fa->data = data;
1 1 4ddad: mov 0x30(%rsp),%rax
. . 1016: if (strides != NULL) {
. . 4ddb2: test %rbx,%rbx
. . 1010: fa->data = data;
. . 4ddb5: mov %rax,0x10(%r13)
. . 1016: if (strides != NULL) {
. . 4ddb9: je 4ddc8 <PyArray_NewFromDescr+0x2d8>
. . 1017: PyArray_UpdateFlags((PyArrayObject *)fa, NPY_ARRAY_UPDATE_ALL);
. . 4ddbb: mov $0x103,%esi
. . 4ddc0: mov %r13,%rdi
. . 4ddc3: callq 7b3a0 <PyArray_UpdateFlags>
6 12 1025: if ((subtype != &PyArray_Type)) {
. . 4ddc8: lea 0x2d7c11(%rip),%rax # 3259e0 <PyArray_Type>
. 6 4ddcf: cmp %rax,0x28(%rsp)
6 6 4ddd4: mov %r13,%rbx
. . 4ddd7: je 4dc45 <PyArray_NewFromDescr+0x155>
. . 1028: func = PyObject_GetAttrString((PyObject *)fa, "__array_finalize__");
. . 4dddd: lea 0x9d5d4(%rip),%rsi # eb3b8 <CSWTCH.53+0x7f8>
. . 4dde4: mov %r13,%rdi
. . 4dde7: callq 19ac0 <PyObject_GetAttrString at plt>
. . 1029: if (func && func != Py_None) {
. . 4ddec: test %rax,%rax
. . 1028: func = PyObject_GetAttrString((PyObject *)fa, "__array_finalize__");
. . 4ddef: mov %rax,%rbp
. . 1029: if (func && func != Py_None) {
. . 4ddf2: je 4dc45 <PyArray_NewFromDescr+0x155>
. . 4ddf8: mov 0x2d7121(%rip),%r12 # 324f20 <_DYNAMIC+0x360>
. . 4ddff: cmp %r12,%rax
. . 4de02: je 4e04f <PyArray_NewFromDescr+0x55f>
. . 1030: if (NpyCapsule_Check(func)) {
. . 4de08: mov 0x2d7109(%rip),%rdx # 324f18 <_DYNAMIC+0x358>
. . 4de0f: cmp %rdx,0x8(%rax)
. . 4de13: je 4e11b <PyArray_NewFromDescr+0x62b>
. . 1040: args = PyTuple_New(1);
. . 4de19: mov $0x1,%edi
. . 4de1e: callq 1a200 <PyTuple_New at plt>
. . 1042: obj=Py_None;
. . 4de23: cmpq $0x0,0x288(%rsp)
. . 1040: args = PyTuple_New(1);
. . 4de2c: mov %rax,%rbx
. . 1046: res = PyObject_Call(func, args, NULL);
. . 4de2f: mov %rbp,%rdi
. . 1042: obj=Py_None;
. . 4de32: cmovne 0x288(%rsp),%r12
. . 1046: res = PyObject_Call(func, args, NULL);
. . 4de3b: mov %rbx,%rsi
. . 4de3e: xor %edx,%edx
. . 1044: Py_INCREF(obj);
. . 4de40: addq $0x1,(%r12)
. . 1045: PyTuple_SET_ITEM(args, 0, obj);
. . 4de45: mov %r12,0x18(%rbx)
. . 1042: obj=Py_None;
. . 4de49: mov %r12,0x288(%rsp)
. . 1046: res = PyObject_Call(func, args, NULL);
. . 4de51: callq 1a830 <PyObject_Call at plt>
. . 1047: Py_DECREF(args);
. . 4de56: subq $0x1,(%rbx)
. . 1046: res = PyObject_Call(func, args, NULL);
. . 4de5a: mov %rax,%r12
. . 1047: Py_DECREF(args);
. . 4de5d: je 4e0ee <PyArray_NewFromDescr+0x5fe>
. . 1048: Py_DECREF(func);
. . 4de63: subq $0x1,0x0(%rbp)
. . 4de68: je 4e0df <PyArray_NewFromDescr+0x5ef>
. . 1049: if (res == NULL) {
. . 4de6e: test %r12,%r12
. . 4de71: je 4e033 <PyArray_NewFromDescr+0x543>
. . 1053: Py_DECREF(res);
. . 4de77: mov (%r12),%rax
. . 4de7b: mov %r13,%rbx
. . 4de7e: sub $0x1,%rax
. . 4de82: test %rax,%rax
. . 4de85: mov %rax,(%r12)
. . 4de89: jne 4dc45 <PyArray_NewFromDescr+0x155>
. . 4de8f: mov 0x8(%r12),%rax
. . 4de94: mov %r12,%rdi
. . 4de97: callq *0x30(%rax)
. . 4de9a: jmpq 4dc45 <PyArray_NewFromDescr+0x155>
. . 4de9f: nop
. . 874: if (!PyDataType_ISSTRING(descr)) {
. . 4dea0: mov 0x1c(%rsi),%eax
. . 4dea3: sub $0x12,%eax
. . 4dea6: cmp $0x1,%eax
. . 4dea9: jbe 4dfe1 <PyArray_NewFromDescr+0x4f1>
. . 875: PyErr_SetString(PyExc_TypeError, "Empty data-type");
. . 4deaf: mov 0x2d6fc2(%rip),%rax # 324e78 <_DYNAMIC+0x2b8>
. . 4deb6: lea 0x9d4d9(%rip),%rsi # eb396 <CSWTCH.53+0x7d6>
. . 904: PyErr_SetString(PyExc_ValueError,
. . 4debd: mov (%rax),%rdi
. . 4dec0: callq 19d10 <PyErr_SetString at plt>
. . 929: Py_DECREF(descr);
. . 4dec5: subq $0x1,(%r15)
. . 4dec9: je 4ded8 <PyArray_NewFromDescr+0x3e8>
. . 4decb: xor %ebx,%ebx
. . 4decd: jmpq 4dc45 <PyArray_NewFromDescr+0x155>
. . 4ded2: nopw 0x0(%rax,%rax,1)
. . 4ded8: mov 0x8(%r15),%rax
. . 4dedc: mov %r15,%rdi
. . 4dedf: xor %ebx,%ebx
. . 4dee1: callq *0x30(%rax)
. . 4dee4: jmpq 4dc45 <PyArray_NewFromDescr+0x155>
. . 4dee9: nopl 0x0(%rax)
. . 863: PyErr_Format(PyExc_ValueError,
. . 4def0: mov 0x2d6f69(%rip),%rax # 324e60 <_DYNAMIC+0x2a0>
. . 4def7: lea 0x9d72a(%rip),%rsi # eb628 <CSWTCH.53+0xa68>
. . 4defe: mov $0x20,%edx
. . 4df03: mov (%rax),%rdi
. . 4df06: xor %eax,%eax
. . 4df08: callq 1a8d0 <PyErr_Format at plt>
. . 4df0d: jmp 4dec5 <PyArray_NewFromDescr+0x3d5>
. . 939: if (nd > 1) {
. . 4df0f: cmp $0x1,%r11d
. . 4df13: jle 4e177 <PyArray_NewFromDescr+0x687>
. . 940: fa->flags &= ~NPY_ARRAY_C_CONTIGUOUS;
. . 4df19: movl $0x502,0x40(%rax)
. . 948: fa->descr = descr;
. . 4df20: mov %r15,0x38(%rax)
. . 949: fa->base = (PyObject *)NULL;
. . 4df24: movq $0x0,0x30(%rax)
. . 950: fa->weakreflist = (PyObject *)NULL;
. . 4df2c: movq $0x0,0x48(%rax)
. . 942: flags = NPY_ARRAY_F_CONTIGUOUS;
. . 4df34: movl $0x2,0x280(%rsp)
. . 4df3f: nop
9 60 953: fa->dimensions = PyDimMem_NEW(3*nd);
. 4 4df40: lea (%r11,%r11,2),%edi
4 9 4df44: mov %r11d,0x10(%rsp)
5 5 4df49: movslq %edi,%rdi
. . 4df4c: shl $0x3,%rdi
. 42 4df50: callq 1aa50 <PyMem_Malloc at plt>
. . 954: if (fa->dimensions == NULL) {
. . 4df55: test %rax,%rax
. 2 953: fa->dimensions = PyDimMem_NEW(3*nd);
. 2 4df58: mov %rax,0x20(%r13)
2 2 954: if (fa->dimensions == NULL) {
2 2 4df5c: mov 0x10(%rsp),%r11d
. . 4df61: je 4e02e <PyArray_NewFromDescr+0x53e>
. . 958: fa->strides = fa->dimensions + nd;
. . 4df67: movslq %r11d,%r8
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
. . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
. . 4df6a: mov %rax,%rdi
. . 4df6d: mov %rbp,%rsi
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
1 2 958: fa->strides = fa->dimensions + nd;
. . 4df70: shl $0x3,%r8
. 1 4df74: lea (%rax,%r8,1),%rdx
1 1 4df78: mov %rdx,0x28(%r13)
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
. 15 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
. . 4df7c: mov %r8,%rdx
. . 4df7f: mov %r8,0x18(%rsp)
. . 4df84: mov %r11d,0x10(%rsp)
. 15 4df89: callq 1a240 <memcpy at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
1 1 960: if (strides == NULL) { /* fill it in */
1 1 4df8e: test %rbx,%rbx
. 1 961: sd = _array_fill_strides(fa->strides, dims, nd, sd,
. 1 4df91: mov 0x28(%r13),%rdi
3 5 960: if (strides == NULL) { /* fill it in */
1 1 4df95: mov 0x18(%rsp),%r8
. 2 4df9a: mov 0x10(%rsp),%r11d
2 2 4df9f: je 4e14a <PyArray_NewFromDescr+0x65a>
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
. . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
. . 4dfa5: mov %r8,%rdx
. . 4dfa8: mov %rbx,%rsi
. . 4dfab: callq 1a240 <memcpy at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
. . 970: sd *= size;
. . 4dfb0: imul %r14,%r12
. . 4dfb4: jmpq 4dd9c <PyArray_NewFromDescr+0x2ac>
. . 4dfb9: nopl 0x0(%rax)
. . 904: PyErr_SetString(PyExc_ValueError,
. . 4dfc0: lea 0x9d691(%rip),%rsi # eb658 <CSWTCH.53+0xa98>
. . 4dfc7: mov 0x2d6e92(%rip),%rax # 324e60 <_DYNAMIC+0x2a0>
. . 4dfce: jmpq 4debd <PyArray_NewFromDescr+0x3cd>
. . 4dfd3: nopl 0x0(%rax,%rax,1)
. . 918: PyErr_SetString(PyExc_ValueError,
. . 4dfd8: lea 0x9d3c7(%rip),%rsi # eb3a6 <CSWTCH.53+0x7e6>
. . 4dfdf: jmp 4dfc7 <PyArray_NewFromDescr+0x4d7>
. . 879: PyArray_DESCR_REPLACE(descr);
. . 4dfe1: mov %rsi,%rdi
. . 4dfe4: mov %edx,0x10(%rsp)
. . 4dfe8: callq 5e680 <PyArray_DescrNew>
. . 4dfed: subq $0x1,(%r15)
. . 4dff1: mov 0x10(%rsp),%r11d
. . 4dff6: je 4e0fd <PyArray_NewFromDescr+0x60d>
. . 4dffc: test %rax,%rax
. . 4dfff: je 4decb <PyArray_NewFromDescr+0x3db>
. . 883: if (descr->type_num == NPY_STRING) {
. . 4e005: cmpl $0x12,0x1c(%rax)
. . 4e009: je 4e0c0 <PyArray_NewFromDescr+0x5d0>
. . 887: sd = descr->elsize = sizeof(npy_ucs4);
. . 4e00f: movl $0x4,0x20(%rax)
. . 4e016: mov %rax,%r15
. . 4e019: mov $0x4,%r12d
. . 4e01f: movabs $0x1fffffffffffffff,%rax
. . 4e029: jmpq 4dcd5 <PyArray_NewFromDescr+0x1e5>
. . 955: PyErr_NoMemory();
. . 4e02e: callq 19bb0 <PyErr_NoMemory at plt>
. 3 1062: Py_DECREF(fa);
. . 4e033: subq $0x1,0x0(%r13)
. . 4e038: jne 4decb <PyArray_NewFromDescr+0x3db>
. . 4e03e: mov 0x8(%r13),%rax
. . 4e042: mov %r13,%rdi
. . 4e045: xor %ebx,%ebx
. . 4e047: callq *0x30(%rax)
. . 4e04a: jmpq 4dc45 <PyArray_NewFromDescr+0x155>
. . 4e04f: subq $0x1,(%rax)
. . 4e053: jne 4dc45 <PyArray_NewFromDescr+0x155>
. . 4e059: mov 0x8(%rax),%rax
. . 4e05d: mov %rbp,%rdi
. . 4e060: callq *0x30(%rax)
. 3 4e063: jmpq 4dc45 <PyArray_NewFromDescr+0x155>
11 20 937: if (flags) {
3 11 4e068: mov 0x280(%rsp),%edi
8 8 4e06f: test %edi,%edi
. 1 4e071: jne 4df0f <PyArray_NewFromDescr+0x41f>
4 7 936: fa->flags = NPY_ARRAY_DEFAULT;
1 4 4e077: movl $0x501,0x40(%rax)
3 3 4e07e: jmpq 4dd72 <PyArray_NewFromDescr+0x282>
. . 985: if (sd == 0) {
. . 4e083: test %r12,%r12
. . 4e086: jne 4e08c <PyArray_NewFromDescr+0x59c>
. 2 986: sd = descr->elsize;
. 2 4e088: movslq 0x20(%r15),%r12
2 53 988: data = PyDataMem_NEW(sd);
2 2 4e08c: mov %r12,%rdi
. 51 4e08f: callq af500 <PyDataMem_NEW>
2 2 989: if (data == NULL) {
2 2 4e094: test %rax,%rax
. 1 988: data = PyDataMem_NEW(sd);
. 1 4e097: mov %rax,0x30(%rsp)
1 1 989: if (data == NULL) {
1 1 4e09c: je 4e02e <PyArray_NewFromDescr+0x53e>
. . 993: fa->flags |= NPY_ARRAY_OWNDATA;
. . 4e09e: orl $0x4,0x40(%r13)
1 2 999: if (PyDataType_FLAGCHK(descr, NPY_NEEDS_INIT)) {
. 1 4e0a3: testb $0x8,0x1b(%r15)
1 1 4e0a8: je 4ddad <PyArray_NewFromDescr+0x2bd>
-------------------- /usr/include/x86_64-linux-gnu/bits/string3.h
. . 85: return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest));
. . 4e0ae: mov %r12,%rdx
. . 4e0b1: xor %esi,%esi
. . 4e0b3: mov %rax,%rdi
. . 4e0b6: callq 19e50 <memset at plt>
. . 4e0bb: jmpq 4ddad <PyArray_NewFromDescr+0x2bd>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
. . 884: sd = descr->elsize = 1;
. . 4e0c0: movl $0x1,0x20(%rax)
. . 4e0c7: mov %rax,%r15
. . 4e0ca: mov $0x1,%r12d
. . 4e0d0: movabs $0x7fffffffffffffff,%rax
. . 4e0da: jmpq 4dcd5 <PyArray_NewFromDescr+0x1e5>
. . 4e0df: mov 0x8(%rbp),%rax
. . 4e0e3: mov %rbp,%rdi
. . 4e0e6: callq *0x30(%rax)
. . 4e0e9: jmpq 4de6e <PyArray_NewFromDescr+0x37e>
. . 4e0ee: mov 0x8(%rbx),%rax
. . 4e0f2: mov %rbx,%rdi
. . 4e0f5: callq *0x30(%rax)
. . 4e0f8: jmpq 4de63 <PyArray_NewFromDescr+0x373>
. . 4e0fd: mov 0x8(%r15),%rdx
. . 4e101: mov %r15,%rdi
. . 4e104: mov %rax,0x18(%rsp)
. . 4e109: callq *0x30(%rdx)
. . 4e10c: mov 0x10(%rsp),%r11d
. . 4e111: mov 0x18(%rsp),%rax
. . 4e116: jmpq 4dffc <PyArray_NewFromDescr+0x50c>
-------------------- ...ip-UN1TwQ-build/numpy/core/include/numpy/npy_3kcompat.h
. . 377: return PyCObject_AsVoidPtr(ptr);
. . 4e11b: mov %rax,%rdi
. . 4e11e: callq 1a6a0 <PyCObject_AsVoidPtr at plt>
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
. . 1034: Py_DECREF(func);
. . 4e123: subq $0x1,0x0(%rbp)
-------------------- ...ip-UN1TwQ-build/numpy/core/include/numpy/npy_3kcompat.h
. . 377: return PyCObject_AsVoidPtr(ptr);
. . 4e128: mov %rax,%rbx
-------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c
. . 1034: Py_DECREF(func);
. . 4e12b: je 4e18e <PyArray_NewFromDescr+0x69e>
. 1 1035: if (cfunc((PyArrayObject *)fa, obj) < 0) {
. . 4e12d: mov 0x288(%rsp),%rsi
. . 4e135: mov %r13,%rdi
. . 4e138: callq *%rbx
. . 4e13a: test %eax,%eax
. . 4e13c: mov %r13,%rbx
. . 4e13f: jns 4dc45 <PyArray_NewFromDescr+0x155>
. 1 4e145: jmpq 4e033 <PyArray_NewFromDescr+0x543>
4 25 961: sd = _array_fill_strides(fa->strides, dims, nd, sd,
1 2 4e14a: mov 0x280(%rsp),%r8d
1 1 4e152: lea 0x40(%r13),%r9
. . 4e156: mov %r12,%rcx
. 1 4e159: mov %r11d,%edx
1 1 4e15c: mov %rbp,%rsi
. 19 4e15f: callq 4da20 <_array_fill_strides>
1 1 4e164: mov %rax,%r12
. . 4e167: jmpq 4dd9c <PyArray_NewFromDescr+0x2ac>
. . 871: size = 1;
. . 4e16c: mov $0x1,%r14d
. . 4e172: jmpq 4dd21 <PyArray_NewFromDescr+0x231>
. . 938: fa->flags |= NPY_ARRAY_F_CONTIGUOUS;
. . 4e177: movl $0x503,0x40(%rax)
. . 942: flags = NPY_ARRAY_F_CONTIGUOUS;
. . 4e17e: movl $0x2,0x280(%rsp)
. . 4e189: jmpq 4dd72 <PyArray_NewFromDescr+0x282>
. . 4e18e: mov 0x8(%rbp),%rax
. . 4e192: mov %rbp,%rdi
. . 4e195: callq *0x30(%rax)
. . 4e198: jmp 4e12d <PyArray_NewFromDescr+0x63d>
. . 4e19a: nopw 0x0(%rax,%rax,1)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: master-array-float64-add.pdf
Type: application/pdf
Size: 19235 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130716/47bcb533/attachment.pdf>
More information about the NumPy-Discussion
mailing list