[Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array

Arink Verma arinkverma at gmail.com
Tue Jul 16 09:34:57 EDT 2013


>Each ndarray does two mallocs, for the obj and buffer. These could be
combined into 1 - just allocate the total size and do some pointer
>arithmetic, then set OWNDATA to false.
So, that two mallocs has been mentioned in project introduction. I got that
wrong.

>magnitude more time in inefficient loop selection and unnecessary writes
to the FP control word?
loop selection, contribute around 2~3% in time. I implemented cache
with PyThreadState_GetDict()
but it didnt help.
Even generating prepopulated dict/list in code_generator/generate_umath.py is
not helping,


Here, it the distribution of time, on addition operations. All memory
related and BuildValue operations cost more than 7%, rest looping ones are
around 2-3%:

   - PyUFunc_AddititonTypeResolver(7.6%)
   - *SimpleBinaryOperationTypeResolver(6.2%)*


   - *execute_legacy_ufunc_loop(20.7%)*
   - trivial_three_operand_loop(8.6%)  ,this will be around 3.4% when pr #
      3521 <https://github.com/numpy/numpy/pull/3521> get merged
      - *PYArray_NewFromDescr(7.3%)*
      - PyUFunc_DefaultLegacyInnerLoopSelector(2.5%)


   - PyUFunc_GetPyValues(12.0%)
   - *_extract_pyvals(9.2%)*
   - *PyArray_Return(14.3%)*


-- 
Arink Verma
www.arinkverma.in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130716/793b8cc8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1-array_cast.svg
Type: image/svg+xml
Size: 92040 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130716/793b8cc8/attachment.svg>


More information about the NumPy-Discussion mailing list