getting text inside the HTML tag

kyosohma at gmail.com kyosohma at gmail.com
Sat Jul 14 14:01:27 EDT 2007


On Jul 14, 12:47 pm, Nikola Skoric <nick-n... at net4u.hr> wrote:
> I'm using sgmllib.SGMLParser to parse HTML. I have successfuly parsed start
> tags by implementing start_something method. But, now I have to fetch the
> string inside the start tag and end tag too. I have been reading through
> SGMLParser documentation, but just can't figure that out... can somebody
> help? :-)
>
> --
> "Now the storm has passed over me
> I'm left to drift on a dead calm sea
> And watch her forever through the cracks in the beams
> Nailed across the doorways of the bedrooms of my dreams"

Oi! Try Beautiful Soup instead. That seems to be the defacto HTML
parser for Python:

http://www.crummy.com/software/BeautifulSoup/

You might find the minidom or lxml modules to your liking as well.

Mike




More information about the Python-list mailing list