aligning a set of word substrings to sentence
Steven Bethard
steven.bethard at gmail.com
Thu Dec 1 19:09:44 EST 2005
Paul McGuire wrote:
> "Steven Bethard" <steven.bethard at gmail.com> wrote in message
> news:dpWdnfW-CeBj1xLeRVn-tw at comcast.com...
>
>>I've got a list of word substrings (the "tokens") which I need to align
>>to a string of text (the "sentence"). The sentence is basically the
>>concatenation of the token list, with spaces sometimes inserted beetween
>>tokens. I need to determine the start and end offsets of each token in
>>the sentence. For example::
>>
>>py> tokens = ['She', "'s", 'gon', 'na', 'write', 'a', 'book', '?']
>>py> text = '''\
>>... She's gonna write
>>... a book?'''
>>py> list(offsets(tokens, text))
>>[(0, 3), (3, 5), (6, 9), (9, 11), (12, 17), (18, 19), (20, 24), (24, 25)]
>
> ===================
> from pyparsing import oneOf
>
> tokens = ['She', "'s", 'gon', 'na', 'write', 'a', 'book', '?']
> text = '''\
> She's gonna write
> a book?'''
>
> tokenlist = oneOf( " ".join(tokens) )
> offsets = [(start,end) for token,start,end in tokenlist.scanString(text) ]
>
> print offsets
> ===================
> [(0, 3), (3, 5), (6, 9), (9, 11), (12, 17), (18, 19), (20, 24), (24, 25)]
Now that's a pretty solution. Three cheers for pyparsing! :)
STeVe
More information about the Python-list
mailing list