sampling items from a nested list

Steven Bethard steven.bethard at gmail.com
Thu Feb 17 01:32:36 EST 2005


Michael Spencer wrote:
> Steven Bethard wrote:
> 
>> So, I have a list of lists, where the items in each sublist are of 
>> basically the same form.  It looks something like:
>>
> ...
>>
>> Can anyone see a simpler way of doing this?
>>
>> Steve
> 
> You just make these up to keep us amused, don't you? ;-)

Heh heh.  I wish.  It's actually about resampling data read in the 
Yamcha data format:

http://chasen.org/~taku/software/yamcha/

So each sublist is a "sentence" and each tuple is the feature vector for 
a "word".  The point is to even out the number of positive and negative 
examples because support vector machines typically work better with 
balanced data sets.

> If you don't need to preserve the ordering, would the following work?:
> 
[snip]
>
>  >>> def resample2(data):
>  ...     bag = {}
>  ...     random.shuffle(data)
>  ...     return [[(item, label)
>  ...                 for item, label in group
>  ...                     if bag.setdefault(label,[]).append(item)
>  ...                         or len(bag[label]) < 3]
>  ...                            for group in data if not 
> random.shuffle(group)]

It would be preferable to preserve ordering, but it's not absolutely 
crucial.  Thanks for the suggestion!

STeVe



More information about the Python-list mailing list