aligning a set of word substrings to sentence

Steven Bethard steven.bethard at gmail.com
Thu Dec 1 16:56:22 EST 2005


Fredrik Lundh wrote:
> Steven Bethard wrote:
>> I feel like there should be a simpler solution (maybe with the re
>> module?) but I can't figure one out.  Any suggestions?
> 
> using the finditer pattern I just posted in another thread:
> 
> tokens = ['She', "'s", 'gon', 'na', 'write', 'a', 'book', '?']
> text = '''\
> She's gonna write
> a book?'''
> 
> import re
> 
> tokens.sort() # lexical order
> tokens.reverse() # look for longest match first
> pattern = "|".join(map(re.escape, tokens))
> pattern = re.compile(pattern)
> 
> I get
> 
> print [m.span() for m in pattern.finditer(text)]
> [(0, 3), (3, 5), (6, 9), (9, 11), (12, 17), (18, 19), (20, 24), (24, 25)]
> 
> which seems to match your version pretty well.

That's what I was looking for.  Thanks!

STeVe



More information about the Python-list mailing list