Finding Line numbers of HTML file

Ramdas ramdaz at gmail.com
Thu Dec 13 10:01:26 EST 2007


Hi Paul,

I am cross posting the same to grab your attention at pyparsing forums
too. 1000 apologies on the same count!

I am a complete newbie to parsing and totally new to pyparsing.

I have adapted your code to store the line numbers as below.
Surprisingly, the line numbers printed, when I scrap some of the URLs,
is not accurate and is kind of way off.


page = urlli2b.urlopen("www.....com).read()

def tallyTagLineNumber(strg, locn, tagTokens):
	line = lineno(locn,strg)
	tagLocs[tagTokens[0]].append(line)



def getlinenos(page):
	anyOpenTag.setParseAction(tallyTagLineNumber)
	anyOpenTag.searchString(page.lower()) # changing the entire string to
lower case to get INPUT
	tagnames = sorted(tagLocs.keys())
	taglinedict={}
	for t in tagnames:
		taglinedict[t]= unique(tagLocs[t])
	return taglinedict



More information about the Python-list mailing list