[Numpy-discussion] Outer join ?

A B python6009 at gmail.com
Thu Feb 12 11:19:05 EST 2009


On 2/11/09, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Feb 11, 2009 at 23:24, A B <python6009 at gmail.com> wrote:
>> Hi,
>>
>> I have the following data structure:
>>
>> col1 | col2 | col3
>>
>> 20080101|key1|4
>> 20080201|key1|6
>> 20080301|key1|5
>> 20080301|key2|3.4
>> 20080601|key2|5.6
>>
>> For each key in the second column, I would like to create an array
>> where for all unique values in the first column, there will be either
>> a value or zero if there is no data available. Like so:
>>
>> # 20080101, 20080201, 20080301, 20080601
>>
>> key1 - 4, 6, 5,    0
>> key2 - 0, 0, 3.4, 5.6
>>
>> Ideally, the results would end up in a 2d array.
>>
>> What's the most efficient way to accomplish this? Currently, I am
>> getting a list of uniq col1 and col2 values into separate variables,
>> then looping through each unique value in col2
>>
>> a = loadtxt(...)
>>
>> dates = unique(a[:]['col1'])
>> keys = unique(a[:]['col2'])
>>
>> for key in keys:
>>    b = a[where(a[:]['col2'] == key)]
>>    ???
>
> Take a look at setmember1d().
>
> --
> Robert Kern
>
Thanks. That's exactly what I need,  but I'm not sure about the next
step after I do

setmember1d(dates, b['date'])

and have the bool arr/mask ... How can I grow b to have 0 values for
the missing keys?



More information about the NumPy-Discussion mailing list