[Numpy-discussion] array constructor from generators?

Tim Hochberg tim.hochberg at cox.net
Thu Apr 6 08:48:09 EDT 2006


Travis Oliphant wrote:

>
>> Can you steal its memory and then give it some dummy memory that it 
>> can free without problems, so that the list can be deallocated 
>> without trouble? Does anyone know if you can just give the list a 
>> NULL pointer for it's memory and then immediately decref it? 
>> free(NULL) should always be safe, I think. (??)
>>
> I don't think you can steal a list's memory since each list element is 
> a actually pointer to some other Python Object.  
> However, a Python array's memory could be stolen (as Tim mentions later).
>
>> This is a good point. Numpy does fine with nested lists, but what 
>> should it do with nested generators? I originally thought that 
>> basically 'array(generator)' should make the exact same thing as 
>> 'array([f for f in generator])'. However, for nested generators, this 
>> would be an object array of generators.
>>
>> I'm not sure which is better -- having more special cases for 
>> generators that make generators, or having a simple rubric like above 
>> for how generators are treated.
>
> I like the idea that generators of generators acts the same as lists 
> of lists (i.e. recursively defined).   Basically to implement this, we 
> need to repeat
>
> Array_FromSequence
> discover_depth
> discover_dimensions
> discover_itemsize
>
> Or, just maybe we can figure out a way to enhance those functions so 
> that creating an array from generators works the same as creating an 
> array from sequences. Right now, the sequence interface is used.  
> Perhaps we could figure out a way to use a more abstract interface 
> which would include both generators and sequences.  If that causes too 
> much alteration then I don't think it's worth it and we just repeat 
> those functions for generators.
>
> Now, I think there are two cases here that are being discussed as one
>
> 1)  Creating arrays from iterators     ---   array( iter(xrange(10) )
> 2)  Creating arrays from generators  ---  array(x for x in xrange(10))
>
> Both of these cases really ought to be handled and really should be 
> integrated into the Array_FromSequence code.  That code is inherited 
> from Numeric and was written before iterators and generators arose on 
> the scene.  There ought to be a way to unify all of these notions 
> (Actually if you handle iterators, then sequences will come along for 
> the ride since sequences can behave as iterators).
> I'd rather see one place in the code that handles these cases.   But, 
> working code is usually better than dreamy plans :-)


I agree with all of this. However, there's one specific case that I 
think we should optimize the heck out of. In fact, I'd be tempted as a 
first cut to only implement this case and raise exceptions in the other 
cases until we get around to implementing them. This one case is:
    * dtype known
    * 1-dimensional
I care about this case because it's common and we can do it efficiently. 
In the other cases I could write a python function that does almost as 
good of a job as we're likely to do in C both in terms of speed and 
memory usage. So the known dtype, 1D case adds important functionality 
while the other "merely" adds convenience (and consistency). Those are 
good, but personally the added functionality is higher on my priority list.

-tim






More information about the NumPy-Discussion mailing list