Clean "Durty" strings

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Mon Apr 2 13:47:43 EDT 2007


In <1175530649.060784.147900 at d57g2000hsg.googlegroups.com>, irstas wrote:

> I'd like to see how this transformation can be done with
> BeautifulSoup. Well, the last two regexps can be replaced with this:
> 
> unicode(BeautifulStoneSoup(s,convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0])

Completely without regular expressions:

def main():
    soup = BeautifulSoup(source, convertEntities=BeautifulSoup.HTML_ENTITIES)
    print ' '.join(''.join(soup(text=True)).split())

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list