[Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

Jeff Reback jeffreback at gmail.com
Mon Feb 15 11:24:51 EST 2016


just an FYI.

pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory
savings,
see here
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#range-index>,
similar to how python range/xrange work.

though there are substantial perf benefits, mainly with set operations, see
here
<https://github.com/pydata/pandas/blob/master/pandas/indexes/range.py#L274>
though didn't officially benchmark thes.

Jeff


On Mon, Feb 15, 2016 at 11:13 AM, Antony Lee <antony.lee at berkeley.edu>
wrote:

> Indeed:
>
> In [1]: class C:
>     def __getitem__(self, i):
>         if i < 10: return i
>         else: raise IndexError
>     def __len__(self):
>         return 10
>    ...:
>
> In [2]: np.array(C())
> Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
>
> (omitting __len__ results in the creation of an object array, consistently
> with the fact that the sequence protocol requires __len__).
> Meanwhile, I found a new way to segfault numpy :-)
>
> In [3]: class C:
>     def __getitem__(self, i):
>         if i < 10: return i
>         else: raise IndexError
>     def __len__(self):
>         return 42
>    ...:
>
> In [4]: np.array(C())
> Fatal Python error: Segmentation fault
>
>
> 2016-02-15 0:10 GMT-08:00 Nathaniel Smith <njs at pobox.com>:
>
>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee at berkeley.edu>
>> wrote:
>> > I wonder whether numpy is using the "old" iteration protocol (repeatedly
>> > calling x[i] for increasing i until StopIteration is reached?)  A quick
>> > timing shows that it is indeed slower.
>>
>> Yeah, I'm pretty sure that np.array doesn't know anything about
>> "iterable", just about "sequence" (calling x[i] for 0 <= i <
>> i.__len__()).
>>
>> (See Sequence vs Iterable:
>> https://docs.python.org/3/library/collections.abc.html)
>>
>> Personally I'd like it if we could eventually make it so np.array
>> specifically looks for lists and only lists, because the way it has so
>> many different fallbacks right now creates all confusion between which
>> objects are elements. Compare:
>>
>> In [5]: np.array([(1, 2), (3, 4)]).shape
>> Out[5]: (2, 2)
>>
>> In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape
>> Out[6]: (2,)
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- https://vorpus.org
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160215/bb2e1c4b/attachment.html>


More information about the NumPy-Discussion mailing list