grouping subsequences with BIO tags

Fri Apr 22 22:10:57 EDT 2005

On Fri, 22 Apr 2005 16:01:42 -0700, Michael Spencer <mahs at telcopartners.com> wrote:

>Steven Bethard wrote:
>> Bengt Richter wrote:
>> 
>>> On Thu, 21 Apr 2005 15:37:03 -0600, Steven Bethard 
>>> <steven.bethard at gmail.com> wrote:
>>>
>>>> I have a list of strings that looks something like:
>>>>    ['O', 'B_X', 'B_Y', 'I_Y', 'O', 'B_X', 'I_X', 'B_X']
>
>[snip]
>>>
>>> With error checks on predecessor relationship,
>>> I think I'd do the whole thing in a generator,
>
>I'm curious why you (Bengt or Steve) think the generator is an advantage here. 
>As Steve stated, the data already exists in lists of strings.

I hadn't seen your post[1], which I think is a nice crisp and clever solution ;-)

I just wrote what I thought was a straightforward solution, anticipating that
the imput list might be some huge bioinfo thing, and you might want to iterate
through the sublists one at a time and not want to build the whole list of
lists as represented by your stack.

[1] I don't know why stuff arrives almost instantly sometimes, and sometimes quite
delayed and out of order, but it is a bit embarrassing to post a me-too without
relevant comment, or being able to decide whether to play open source leapfrog.
In this case, I don't see a lily pad on the other side of your code, other than
the memory aspect ;-)

>
>The direct list-building solution I posted is simpler, and quite a bit faster.
>
>L = ['O', 'B_X', 'B_Y', 'I_Y', 'O', 'B_X', 'I_X', 'B_X']
>
>def timethem(lst, funcs = (get_runsSB, get_runsMS, get_runsBR)):
>     for func in funcs:
>         print shell.timefunc(func, lst)
>
>  >>> timethem(L)
>  get_runsSB(...)  7877 iterations, 63.48usec per call
>  get_runsMS(...)  31081 iterations, 16.09usec per call
>  get_runsBR(...)  16114 iterations, 31.03usec per call
>
>
>Michael
>

Regards,
Bengt Richter