Finding Line numbers of HTML file

Ramdas ramdaz at gmail.com
Thu Dec 13 10:04:22 EST 2007




Hi Paul,


I am cross posting the same to grab your attention at pyparsing forums
too. 1000 apologies on the same count!

I am a complete newbie to parsing and totally new to pyparsing.

I have adapted your code to store the line numbers as below.
Surprisingly, the line numbers printed, when I scrap some of the URLs,
is not accurate and is kind of way off.

page = urlli2b.urlopen("www.....com).read()

def tallyTagLineNumber(strg, locn, tagTokens):
        line = lineno(locn,strg)
        tagLocs[tagTokens[0]].append(line)

def getlinenos(page):
        anyOpenTag.setParseAction(tallyTagLineNumber)
        anyOpenTag.searchString(page.lower()) # changing the entire
string to lowercase, to grab
        # input and INPUT from html as input tag ONLy

        tagnames = sorted(tagLocs.keys())
        taglinedict={}
        for t in tagnames:
                taglinedict[t]= unique(tagLocs[t])
        return taglinedict


What did I do wrong and why this problem!

Ramdas



More information about the Python-list mailing list