sampling items from a nested list
Michael Spencer
mahs at telcopartners.com
Thu Feb 17 01:53:00 EST 2005
Steven Bethard wrote:
> Michael Spencer wrote:
>
>> Steven Bethard wrote:
>>
>>> So, I have a list of lists, where the items in each sublist are of
>>> basically the same form. It looks something like:
>>>
>> ...
>>
>>>
>>> Can anyone see a simpler way of doing this?
>>>
>>> Steve
>>
>>
>> You just make these up to keep us amused, don't you? ;-)
>
>
> Heh heh. I wish. It's actually about resampling data read in the
> Yamcha data format:
>
> http://chasen.org/~taku/software/yamcha/
>
> So each sublist is a "sentence" and each tuple is the feature vector for
> a "word". The point is to even out the number of positive and negative
> examples because support vector machines typically work better with
> balanced data sets.
>
>> If you don't need to preserve the ordering, would the following work?:
>>
> [snip]
>
>>
>> >>> def resample2(data):
>> ... bag = {}
>> ... random.shuffle(data)
>> ... return [[(item, label)
>> ... for item, label in group
>> ... if bag.setdefault(label,[]).append(item)
>> ... or len(bag[label]) < 3]
>> ... for group in data if not
>> random.shuffle(group)]
>
>
> It would be preferable to preserve ordering, but it's not absolutely
> crucial. Thanks for the suggestion!
>
> STeVe
Maybe combine this with a DSU pattern? Not sure whether the result would be
better than what you started with
Michael
More information about the Python-list
mailing list