grouping subsequences with BIO tags

Fri Apr 22 19:01:42 EDT 2005

Steven Bethard wrote:
> Bengt Richter wrote:
> 
>> On Thu, 21 Apr 2005 15:37:03 -0600, Steven Bethard 
>> <steven.bethard at gmail.com> wrote:
>>
>>> I have a list of strings that looks something like:
>>>    ['O', 'B_X', 'B_Y', 'I_Y', 'O', 'B_X', 'I_X', 'B_X']

[snip]
>>
>> With error checks on predecessor relationship,
>> I think I'd do the whole thing in a generator,

I'm curious why you (Bengt or Steve) think the generator is an advantage here. 
As Steve stated, the data already exists in lists of strings.

The direct list-building solution I posted is simpler, and quite a bit faster.

L = ['O', 'B_X', 'B_Y', 'I_Y', 'O', 'B_X', 'I_X', 'B_X']

def timethem(lst, funcs = (get_runsSB, get_runsMS, get_runsBR)):
     for func in funcs:
         print shell.timefunc(func, lst)

  >>> timethem(L)
  get_runsSB(...)  7877 iterations, 63.48usec per call
  get_runsMS(...)  31081 iterations, 16.09usec per call
  get_runsBR(...)  16114 iterations, 31.03usec per call

Michael