Text over multiple lines

Rigga Rigga at hasnomail.com
Sun Jun 20 15:02:11 EDT 2004


On Sun, 20 Jun 2004 17:22:53 +0000, Nelson Minar wrote:

> Rigga <Rigga at hasnomail.com> writes:
>> I am using the HTMLParser to parse a web page, part of the routine I need
>> to write (I am new to Python) involves looking for a particular tag and
>> once I know the start and the end of the tag then to assign all the data
>> in between the tags to a variable, this is easy if the tag starts and ends
>> on the same line however how would I go about doing it if its split over
>> two or more lines?
> 
> I often have variants of this problem too. The simplest way to make it
> work is to read all the HTML in at once with a single call to
> file.read(), and then use a regular expression. Note that you probably
> don't need re.MULTILINE, although you should take a look at what it
> means in the docs just to know.
> 
> This works fine as long as you expect your files to be relatively
> small (under a meg or so).

Im reading the entire file in to a variable at the moment and passing it
through HTMLParser.  I have ran in to another problem that I am having a
hard time working out, my data is in this format:

        <TD><SPAN class=qv id=EmployeeNo
        title="Employee Number">123456</SPAN></TD></TR>

Some times the data is spread over 3 lines like:

        <TD><SPAN class=qv id=BusinessName
        title="Business Name">Some Shady Business
        Group Ltd.</SPAN></TD></TR></TBODY></TABLE></TD></TR>

The data I need to get is the data enclosed in quotes after the word
title= and data after the > and before the </SPAN, in the case aove would
be: Some Shady Business
        Group Ltd.

Running the file through HTMLParser I discovered that the title= part
and the data part I need is contained in a list therefore I have done this:

snippet of my code:

class MyHTMLParser(HTMLParser):

	def handle_starttag(self, tag, attrs):
		print "Encountered the beginning of a %s tag" % tag
	
	def handle_data(self, data):
		if "title=" in data:
			print "found title"

However I can not work out how to search through the data (which is in a
list) to pull out the data I need.

Sorry if this is a dumb question but hey I am learning!

Many thanks

Rigga




More information about the Python-list mailing list