[Numpy-discussion] Making cdecimal.Decimal a native numpy type
Dr.Leo
fhaxbox66 at googlemail.com
Sun Jul 22 08:54:36 EDT 2012
Hi,
I am a seasoned numpy/pandas user mainly interested in financial
applications. These and other applications would greatly benefit from a
decimal data type with flexible rounding rules, precision etc.
Yes, there is cdecimal, the traditional decimal module from the Python
stdlib rewritten in C,
- http://www.bytereef.org/mpdecimal/index.html -
which has become part of the stdlib from Python 3.3.
However, it appears that cdecimal cannot be meaningfully used with numpy
(see the benchmark below). Squaring an n=10000 ndarray is 1500 times
faster with float64 than with a dtype=object ndarray based on
cdecimal.Decimal, and even simple operations fail in the first place.
I am not deeply enough into ufuncs etc. to judge if some of these
problems can be avoided with a few lines of Python code. However, my
impression is that ultimately we would all benefit from cdecimal.Decimal
becoming a native numpy type. Put bluntly, cdecimal is a great tool. But
it is not yet where we most need it.
The author of cdecimal, Stefan Krah, would probably have a great deal of
the skillset needed to successfully take such a project forward. He
happens to have also written the new memoryview implementation of Python
3.3. And from recent correspondence I understand he might be willing to
get involved in an effort to marry numpy and cdecimal.
The main question is if such project would fit into what core developers
see as the future of numpy.
Regards
Leo
And here is the benchmark:
In [1]: from numpy import *
In [2]: from cdecimal import Decimal
In [3]: r=random.rand(10000)
In [4]: d=ndarray(10000, dtype=Decimal)
In [5]: d.dtype
Out[5]: dtype('object')
In [6]: r.dtype
Out[6]: dtype('float64')
In [7]: for i in range(10000): d[i] = Decimal(r[i])
In [8]: %timeit r**2
100000 loops, best of 3: 14.7 us per loop
In [9]: %timeit d**2
10 loops, best of 3: 21.2 ms per loop
In [10]: r.var()
Out[10]: 0.082478142261349557
In [11]: d.var()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\<ipython-input
-11-bf09d28e33ab> in <module>()
----> 1 d.var()
More information about the NumPy-Discussion
mailing list