[pypy-dev] certificate for accepting numpypy new funcs?

Fri Jan 20 01:15:17 CET 2012

On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <dmitrey15 at ukr.net> wrote:
> On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:
>>
>> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<dmitrey15 at ukr.net>  wrote:
>>>
>>> Hi all,
>>> could you provide clarification to numpypy new funcs accepting (not only
>>> for
>>> me, but for any other possible volunteers)?
>>> The doc I've been directed says only "You have to test exhaustively your
>>> module", while I would like to know more explicit rules.
>>> For example, "at least 3 tests per func" (however, I guess for funcs of
>>> different complexity and variability number of tests also should expected
>>> to
>>> be different).
>>> Also, are there any strict rules for the testcases to be submitted, or I,
>>> for example, can mere write
>>>
>>> if __name__ == '__main__':
>>>    assert array_equal(1, 1)
>>>    assert array_equal([1, 2], [1, 2])
>>>    assert array_equal(N.array([1, 2]), N.array([1, 2]))
>>>    assert array_equal([1, 2], N.array([1, 2]))
>>>    assert array_equal([1, 2], [1, 2, 3]) is False
>>>    print('passed')
>>
>> We have pretty exhaustive automated testing suites. Look for example
>> in pypy/module/micronumpy/test directory for the test file style.
>> They're run with py.test and we require at the very least full code
>> coverage (every line has to be executed, there are tools to check,
>> like coverage). Also passing "unusual" input, like sys.maxint  etc. is
>> usually recommended. With your example, you would check if it works
>> for say views and multidimensional arrays. Also "is False" is not
>> considered good style.
>>
>>> Or there is a certain rule for storing files with tests?
>>>
>>> If I or someone else will submit a func with some tests like in the
>>> example
>>> above, will you put the func and tests in the proper files by yourself?
>>> I'm
>>> not lazy to go for it by myself, but I mere no merged enough into numpypy
>>> dev process, including mercurial branches and numpypy files structure,
>>> and
>>> can spend only quite limited time for diving into it in nearest future.
>>
>> We generally require people to put their own tests as they go with the
>> code (in appropriate places) because you also should not break
>> anything. The usefullness of a patch that has to be sliced and diced
>> and put into places is very limited and for straightforward
>> mostly-copied code, like array_equal, plain useless, since it's almost
>> as much work to just do it.
>
> Well, for this func (array_equal) my docstrings really were copied from
> cpython numpy (why wouln't do this to save some time, while license allows
> it?), but
> * why would'n go for this (), while other programmers are busy by other
> tasks?
> * engines of my and CPython numpy funcs complitely differs. At first, in
> PyPy the CPython code just doesn't work at all (because of the problem with
> ndarray.flat). At 2nd, I have implemented walkaround - just replaced some
> code lines by
>    Size = a1.size
>    f1, f2 = a1.flat, a2.flat
>    # TODO: replace xrange by range in Python3
>    for i in xrange(Size):
>        if f1.next() != f2.next(): return False
>    return True
>
> Here are some results in CPython for the following bench:
>
> from time import time
> n = 100000
> m = 100
> a = N.zeros(n)
> b = N.ones(n)
> t = time()
> for i in range(m):
>    N.array_equal(a, b)
> print('classic numpy array_equal time elapsed (on different arrays): %0.5f'
> % (time()-t))
>
>
> t = time()
> for i in range(m):
>    array_equal(a, b)
> print('Alternative array_equal time elapsed (on different arrays): %0.5f' %
> (time()-t))
>
> b = N.zeros(n)
>
> t = time()
> for i in range(m):
>    N.array_equal(a, b)
> print('classic numpy array_equal time elapsed (on same arrays): %0.5f' %
> (time()-t))
>
> t = time()
> for i in range(m):
>    array_equal(a, b)
> print('Alternative array_equal time elapsed (on same arrays): %0.5f' %
> (time()-t))
>
> CPython numpy results:
> classic numpy array_equal time elapsed (on different arrays): 0.07728
> Alternative array_equal time elapsed (on different arrays): 0.00056
> classic numpy array_equal time elapsed (on same arrays): 0.11163
> Alternative array_equal time elapsed (on same arrays): 9.09458
>
> PyPy results (cannot test on "classic" version because it depends on some
> funcs that are unavailable yet):
> Alternative array_equal time elapsed (on different arrays): 0.00133
> Alternative array_equal time elapsed (on same arrays): 0.95038
>
>
> So, as you see, even in CPython numpy my version is 138 times faster for
> different arrays (yet slower in 90 times for same arrays). However, in real
> world usually different arrays come to this func, and only sometimes similar
> arrays are encountered.
> Well, for my implementation for case of equal arrays time elapsed
> essentially depends on their size, but in either way I still think my
> implementation is better than CPython, - it's faster and doesn't require
> allocation of memory for the boolean array, that will go to the logical_and.
>
> I updated my array_equal implementation with the changes mentioned above,
> some tests on multidimensional arrays you've asked and put it in
> http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry with
> the link).
>
>
> -----------------------
> Regards, D.
> http://openopt.org/Dmitrey
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> http://mail.python.org/mailman/listinfo/pypy-dev

Worth pointing out that the implementation of array_equal and
array_equiv in NumPy are a bit embarrassing because they require a
full N comparisons instead of short-circuiting whenever a False value
is found. This is completely silly IMHO:

In [34]: x = np.random.randn(100000)

In [35]: y = np.random.randn(100000)

In [36]: timeit np.array_equal(x, y)
1000 loops, best of 3: 349 us per loop

- W