Regular Expressions

John Machin sjmachin at lexicon.net
Mon Feb 12 06:17:08 EST 2007


On Feb 12, 9:20 pm, "deviantbunnyl... at gmail.com"
<deviantbunnyl... at gmail.com> wrote:
> HTML: htmllib and HTMLParser (both in the Python library),
> BeautifulSoup (again GIYF)
>
> XML: xml.* in the Python library. ElementTree (recommended) is
> included in Python 2.5; use xml.etree.cElementTree.
>
> The source of HTMLParser and xmllib use regular expressions for
> parsing out the data. htmllib calls sgmllib at the begining of it's
> code--sgmllib starts off with a bunch of regular expressions used to
> parse data. So the only real difference there I see is that someone
> saved me the work of writing them ;0). I haven't looked at the source
> for Beautiful Soup, though I have the sneaking suspicion that most
> processing of html/xml is all based on regex's.

That's right. Those modules use regexes. You don't. You call functions
& classes in the modules.

Someone has written those modules and tested them and documented them
and they've had a fair old thrashing by quite a few people over the
years -- it may be the only difference in your way of thinking but
it's quite a large difference from you opening up the re docs and
getting stuck in single-handedly :-)




More information about the Python-list mailing list