[Tutor] HTML Parser woes
Alan Gauld
alan.gauld at btinternet.com
Tue Mar 4 17:26:01 CET 2014
My turn to ask a question.
This has me pulling my hair out. Hopefully it's something obvious...
I'm trying to pull some dates out of an HTML web page generated
from an Excel spreadsheet.
I've simplified things somewhat so the file(sample.htm) looks like:
<html>
<body link=blue vlink=purple>
<table border=0 cellpadding=0 cellspacing=0 width=752
style='border-collapse:
collapse;table-layout:fixed;width:564pt'>
<tr class=xl66 height=21 style='height:15.75pt'>
<td height=21 class=xl66 width=64
style='height:15.75pt;width:48pt'>ItemID</td>
<td class=xl66 width=115 style='width:86pt'>Name</td>
<td class=xl66 width=99 style='width:74pt'>DateLent</td>
<td class=xl66 width=121 style='width:91pt'>DateReturned</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 align=right style='height:15.0pt'>1</td>
<td>LawnMower</td>
<td>Small Hover mower</td>
<td>Fred</td>
<td>Joe</td>
<td class=xl65 align=right>4/1/2012</td>
<td class=xl65 align=right>4/26/2012</td>
</tr>
</table>
</body>
</html>
The code looks like:
import html.parser
class SampleParser(html.parser.HTMLParser):
def __init__(self):
super().__init__()
self.isDate = False
def handle_starttag(self, name, attributes):
if name == 'td':
for key, value in attributes:
if key == 'class':
print ('Class Value: ',repr(value))
if value.endswith('165'):
print ('We got a date')
self.isDate = True
break
def handle_endtag(self,name):
self.isDate = False
def handle_data(self, data):
if self.isDate:
print('Date: ', data)
if __name__ == '__main__':
print('start test')
htm = open('sample.htm').read()
parser = SampleParser()
parser.feed(htm)
print('end test')
And the output looks like:
start test
Class Value: 'xl66'
Class Value: 'xl66'
Class Value: 'xl66'
Class Value: 'xl66'
Class Value: 'xl65'
Class Value: 'xl65'
end test
As you can see I'm picking up the class attribute and
its value but the conditional test for x165 is failing.
I've tried
if value == 'x165'
if 'x165' in value
and every other test I can think of.
Why am I not seeing the "We got a date" message?
PS.
Please don't suggest other modules/packages etc,
I'm using html.parser for a reason.
Frustratedly,
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list