Clean "Durty" strings
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Mon Apr 2 13:47:43 EDT 2007
In <1175530649.060784.147900 at d57g2000hsg.googlegroups.com>, irstas wrote:
> I'd like to see how this transformation can be done with
> BeautifulSoup. Well, the last two regexps can be replaced with this:
>
> unicode(BeautifulStoneSoup(s,convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0])
Completely without regular expressions:
def main():
soup = BeautifulSoup(source, convertEntities=BeautifulSoup.HTML_ENTITIES)
print ' '.join(''.join(soup(text=True)).split())
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list