[Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

Sun Feb 14 03:21:34 EST 2016

re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.)  Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray.  It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).

re: iterable vs iterator: check for the presence of the __next__ special
method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
isinstance(x, Iterable))

Antony

2016-02-13 18:48 GMT-08:00 <josef.pktd at gmail.com>:

>
>
> On Sat, Feb 13, 2016 at 9:43 PM, <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee <antony.lee at berkeley.edu>
>> wrote:
>>
>>> Compare (on Python3 -- for Python2, read "xrange" instead of "range"):
>>>
>>> In [2]: %timeit np.array(range(1000000), np.int64)
>>> 10 loops, best of 3: 156 ms per loop
>>>
>>> In [3]: %timeit np.arange(1000000, dtype=np.int64)
>>> 1000 loops, best of 3: 853 µs per loop
>>>
>>>
>>> Note that while iterating over a range is not very fast, it is still
>>> much better than the array creation:
>>>
>>> In [4]: from collections import deque
>>>
>>> In [5]: %timeit deque(range(1000000), 1)
>>> 10 loops, best of 3: 25.5 ms per loop
>>>
>>>
>>> On one hand, special cases are awful. On the other hand, the range
>>> builtin is probably important enough to deserve a special case to make this
>>> construction faster. Or not? I initially opened this as
>>> https://github.com/numpy/numpy/issues/7233 but it was suggested there
>>> that this should be discussed on the ML first.
>>>
>>> (The real issue which prompted this suggestion: I was building sparse
>>> matrices using scipy.sparse.csc_matrix with some indices specified using
>>> range, and that construction step turned out to take a significant portion
>>> of the time because of the calls to np.array).
>>>
>>
>>
>> IMO: I don't see a reason why this should be supported. There is
>> np.arange after all for this usecase, and from_iter.
>> range and the other guys are iterators, and in several cases we can use
>> larange = list(range(...)) as a short cut to get python list.for python 2/3
>> compatibility.
>>
>> I think this might be partially a learning effect in the python 2 to 3
>> transition. After using almost only python 3 for maybe a year, I don't
>> think it's difficult to remember the differences when writing code that is
>> py 2.7 and py 3.x compatible.
>>
>>
>> It's just **another** thing to watch out for if milliseconds matter in
>> your application.
>>
>
>
> side question: Is there a simple way to distinguish a iterator or
> generator from an iterable data structure?
>
> Josef
>
>
>
>>
>> Josef
>>
>>
>>>
>>> Antony
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160214/f805136d/attachment.html>