getting text inside the HTML tag
Stefan Behnel
stefan.behnel-n05pAM at web.de
Mon Jul 16 04:06:47 EDT 2007
Bruno Desthuilliers wrote:
> kyosohma at gmail.com a écrit :
>> On Jul 14, 12:47 pm, Nikola Skoric <nick-n... at net4u.hr> wrote:
>>> I'm using sgmllib.SGMLParser to parse HTML. I have successfuly parsed
>>> start
>>> tags by implementing start_something method. But, now I have to fetch
>>> the
>>> string inside the start tag and end tag too. I have been reading through
>>> SGMLParser documentation, but just can't figure that out... can somebody
>>> help? :-)
>>>
>>> --
>>> "Now the storm has passed over me
>>> I'm left to drift on a dead calm sea
>>> And watch her forever through the cracks in the beams
>>> Nailed across the doorways of the bedrooms of my dreams"
>>
>> Oi! Try Beautiful Soup instead. That seems to be the defacto HTML
>> parser for Python:
>
> Nope. It's the defacto parser for HTML-like tag soup !-)
Very true. As long as you're dealing with something that looks pretty much
like HTML, I actually don't think you can beat lxml.html (and it's still
getting better every day).
Stefan
More information about the Python-list
mailing list