Finding Line numbers of HTML file

Larry Bates larry.bates at websafe.com
Wed Dec 12 17:34:46 EST 2007


Ramdas wrote:
> I am doing some HTML scrapping for a side project.
> 
> I need a method using sgmllib or HTMLParser to parse an HTML file and
> get line nos of all the tags
> 
> I tried a few things, but I am just not able to work with either if
> the parsers.
> 
> 
> 
> Can someone help
> 

HTML doesn't really have "lines" it is just a stream of text that can be 
formatted with line end characters to make it somewhat easier for humans to 
read.  Parsers probably won't give you any line numbers.  What is the use case 
for this (e.g. why do you think you need the line numbers)?  Extraction is done 
using tags not line numbers.  All that said, you should look at Beautiful Soup 
module before continuing.

-Larry



More information about the Python-list mailing list