[Tim] > ... > You could do a gross test to see whether this is important in your data > by changing the last line of tokenizer.textparts() from > > return text - redundant_html > to > return text + redundant_html Oops! Make that return text | redundant_html "+" gives a TypeError.