Regular Expressions
deviantbunnylord at gmail.com
deviantbunnylord at gmail.com
Mon Feb 12 05:20:11 EST 2007
HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)
XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.
The source of HTMLParser and xmllib use regular expressions for
parsing out the data. htmllib calls sgmllib at the begining of it's
code--sgmllib starts off with a bunch of regular expressions used to
parse data. So the only real difference there I see is that someone
saved me the work of writing them ;0). I haven't looked at the source
for Beautiful Soup, though I have the sneaking suspicion that most
processing of html/xml is all based on regex's.
More information about the Python-list
mailing list