sampling items from a nested list

Michael Spencer mahs at telcopartners.com
Thu Feb 17 01:47:20 EST 2005


Michael Spencer wrote:

>  >>> def resample2(data):
>  ...     bag = {}
>  ...     random.shuffle(data)
>  ...     return [[(item, label)
>  ...                 for item, label in group
>  ...                     if bag.setdefault(label,[]).append(item)
>  ...                         or len(bag[label]) < 3]
>  ...                            for group in data if not 

...which failed to calculate the minimum count of labels, try this instead 
(while I was at it, I removed the insance LC)

  >>> def resample3(data):
  ...     bag = {}
  ...     sample = []
  ...     labels  = [label for group in data for item, label in group]
  ...     min_count = min(labels.count(label) for label in set(labels))
  ...     random.shuffle(data)
  ...     for subgroup in data:
  ...         random.shuffle(subgroup)
  ...         subgroupsample = []
  ...         for item, label in subgroup:
  ...             bag.setdefault(label,[]).append(item)
  ...             if len(bag[label]) <= min_count:
  ...                 subgroupsample.append((item,label))
  ...         sample.append(subgroupsample)
  ...     return sample
  ...
  >>>

Cheers

Michael




More information about the Python-list mailing list