grouping subsequences with BIO tags
Michael Spencer
mahs at telcopartners.com
Thu Apr 21 18:30:00 EDT 2005
Steven Bethard wrote:
> I have a list of strings that looks something like:
> ['O', 'B_X', 'B_Y', 'I_Y', 'O', 'B_X', 'I_X', 'B_X']
I'd have done it the same way as you, but here's 'another' way:
>>> def grp(lst):
... stack = []
... for label in lst:
... prefix = label[0]
... if prefix == 'B':
... group = [label]
... stack.append(group)
... elif prefix == 'I':
... if group[0][2:] != label[2:]:
... raise ValueError('%s followed by %s' %
... (group[0], label))
... group.append(label)
... elif prefix == 'O':
... group = [label]
... return stack
...
>>>
>>> grp(['O', 'B_X', 'B_Y', 'I_Y', 'O', 'B_X', 'I_X', 'B_X'])
[['B_X'], ['B_Y', 'I_Y'], ['B_X', 'I_X'], ['B_X']]
>>>
>>> grp(['O', 'B_X', 'B_Y', 'I_Y', 'O', 'B_X', 'O', 'I_X', 'B_X'])
Traceback (most recent call last):
File "<input>", line 1, in ?
File "\\CC1040907-A\MichaelDocuments\PyDev\Junk\BIO.py", line 32, in grp
raise ValueError('%s followed by %s' %
ValueError: O followed by I_X
Michael
More information about the Python-list
mailing list