sampling items from a nested list
Steven Bethard
steven.bethard at gmail.com
Thu Feb 17 01:32:36 EST 2005
Michael Spencer wrote:
> Steven Bethard wrote:
>
>> So, I have a list of lists, where the items in each sublist are of
>> basically the same form. It looks something like:
>>
> ...
>>
>> Can anyone see a simpler way of doing this?
>>
>> Steve
>
> You just make these up to keep us amused, don't you? ;-)
Heh heh. I wish. It's actually about resampling data read in the
Yamcha data format:
http://chasen.org/~taku/software/yamcha/
So each sublist is a "sentence" and each tuple is the feature vector for
a "word". The point is to even out the number of positive and negative
examples because support vector machines typically work better with
balanced data sets.
> If you don't need to preserve the ordering, would the following work?:
>
[snip]
>
> >>> def resample2(data):
> ... bag = {}
> ... random.shuffle(data)
> ... return [[(item, label)
> ... for item, label in group
> ... if bag.setdefault(label,[]).append(item)
> ... or len(bag[label]) < 3]
> ... for group in data if not
> random.shuffle(group)]
It would be preferable to preserve ordering, but it's not absolutely
crucial. Thanks for the suggestion!
STeVe
More information about the Python-list
mailing list