Finding Line numbers of HTML file
Larry Bates
larry.bates at websafe.com
Wed Dec 12 17:34:46 EST 2007
Ramdas wrote:
> I am doing some HTML scrapping for a side project.
>
> I need a method using sgmllib or HTMLParser to parse an HTML file and
> get line nos of all the tags
>
> I tried a few things, but I am just not able to work with either if
> the parsers.
>
>
>
> Can someone help
>
HTML doesn't really have "lines" it is just a stream of text that can be
formatted with line end characters to make it somewhat easier for humans to
read. Parsers probably won't give you any line numbers. What is the use case
for this (e.g. why do you think you need the line numbers)? Extraction is done
using tags not line numbers. All that said, you should look at Beautiful Soup
module before continuing.
-Larry
More information about the Python-list
mailing list