Text over multiple lines
Rigga
Rigga at hasnomail.com
Sun Jun 20 15:02:11 EDT 2004
On Sun, 20 Jun 2004 17:22:53 +0000, Nelson Minar wrote:
> Rigga <Rigga at hasnomail.com> writes:
>> I am using the HTMLParser to parse a web page, part of the routine I need
>> to write (I am new to Python) involves looking for a particular tag and
>> once I know the start and the end of the tag then to assign all the data
>> in between the tags to a variable, this is easy if the tag starts and ends
>> on the same line however how would I go about doing it if its split over
>> two or more lines?
>
> I often have variants of this problem too. The simplest way to make it
> work is to read all the HTML in at once with a single call to
> file.read(), and then use a regular expression. Note that you probably
> don't need re.MULTILINE, although you should take a look at what it
> means in the docs just to know.
>
> This works fine as long as you expect your files to be relatively
> small (under a meg or so).
Im reading the entire file in to a variable at the moment and passing it
through HTMLParser. I have ran in to another problem that I am having a
hard time working out, my data is in this format:
<TD><SPAN class=qv id=EmployeeNo
title="Employee Number">123456</SPAN></TD></TR>
Some times the data is spread over 3 lines like:
<TD><SPAN class=qv id=BusinessName
title="Business Name">Some Shady Business
Group Ltd.</SPAN></TD></TR></TBODY></TABLE></TD></TR>
The data I need to get is the data enclosed in quotes after the word
title= and data after the > and before the </SPAN, in the case aove would
be: Some Shady Business
Group Ltd.
Running the file through HTMLParser I discovered that the title= part
and the data part I need is contained in a list therefore I have done this:
snippet of my code:
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag
def handle_data(self, data):
if "title=" in data:
print "found title"
However I can not work out how to search through the data (which is in a
list) to pull out the data I need.
Sorry if this is a dumb question but hey I am learning!
Many thanks
Rigga
More information about the Python-list
mailing list