getting text inside the HTML tag

Stefan Behnel stefan.behnel-n05pAM at web.de
Mon Jul 16 04:06:47 EDT 2007


Bruno Desthuilliers wrote:
> kyosohma at gmail.com a écrit :
>> On Jul 14, 12:47 pm, Nikola Skoric <nick-n... at net4u.hr> wrote:
>>> I'm using sgmllib.SGMLParser to parse HTML. I have successfuly parsed
>>> start
>>> tags by implementing start_something method. But, now I have to fetch
>>> the
>>> string inside the start tag and end tag too. I have been reading through
>>> SGMLParser documentation, but just can't figure that out... can somebody
>>> help? :-)
>>>
>>> -- 
>>> "Now the storm has passed over me
>>> I'm left to drift on a dead calm sea
>>> And watch her forever through the cracks in the beams
>>> Nailed across the doorways of the bedrooms of my dreams"
>>
>> Oi! Try Beautiful Soup instead. That seems to be the defacto HTML
>> parser for Python:
> 
> Nope. It's the defacto parser for HTML-like tag soup !-)

Very true. As long as you're dealing with something that looks pretty much
like HTML, I actually don't think you can beat lxml.html (and it's still
getting better every day).

Stefan



More information about the Python-list mailing list