grouping subsequences with BIO tags

Fri Apr 22 19:16:47 EDT 2005

Michael Spencer wrote:
> Steven Bethard wrote:
> 
>> Bengt Richter wrote:
>>
>>> On Thu, 21 Apr 2005 15:37:03 -0600, Steven Bethard 
>>> <steven.bethard at gmail.com> wrote:
>>>
>>>> I have a list of strings that looks something like:
>>>>    ['O', 'B_X', 'B_Y', 'I_Y', 'O', 'B_X', 'I_X', 'B_X']
> 
> [snip]
> 
>>> With error checks on predecessor relationship,
>>> I think I'd do the whole thing in a generator,
> 
> I'm curious why you (Bengt or Steve) think the generator is an advantage 
> here. As Steve stated, the data already exists in lists of strings.
> 
> The direct list-building solution I posted is simpler, and quite a bit 
> faster.

Aren't they basically just the same solution, with your stack.append 
replaced by a yield (and with a little additional error checking)?  As 
far as I'm concerned, either solution is great and writes the code that 
I couldn't. ;)

If you're still interested, in the real problem, the data doesn't exist 
as a list of strings; it exists as a list of objects for which there is 
a Python wrapper to a C API that retrieves the string.  I don't know 
exactly what happens in the wrapping, but it's possible that I can 
conserve some memory by using the generator function.  But I'd have to 
profile it to know for sure.

STeVe