Beautiful parse joy - Oh what fun
Larry Bates
larry.bates at websafe.com
Tue May 16 17:38:16 EDT 2006
rh0dium wrote:
> Hi all,
>
> I am trying to parse into a dictionary a table and I am having all
> kinds of fun. Can someone please help me out.
>
> What I want is this:
>
> dic={'Division Code':'SALS','Employee':'LOO ABLE'}
>
> Here is what I have..
>
> html="""<table> <tr valign="top"><td width="24"><img
> src="/icons/ecblank.gif" border="0" height="1" width="1" alt=""
> /></td><td width="129"><b><font size="2" face="Arial">Division Code:
> </font></b></td><td width="693"><font size="2"
> face="Arial">SALS</font></td></tr> <tr valign="top"><td width="24"><img
> src="/icons/ecblank.gif" border="0" height="1" width="1" alt="" /> <td
> width="129"><b><font size="2" face="Arial">Employee:
> </font></b></td> <td width="693"><font size="2"
> face="Arial">LOO</font><b><font size="2" face="Arial"> </font></b><font
> size="2" face="Arial">ABLE</font></td></tr></table> """
>
>
> from BeautifulSoup import BeautifulSoup
> soup = BeautifulSoup()
> soup.feed(html)
>
> dic={}
> for row in soup('table')[0]('tr'):
> column = row('td')
> print column[1].findNext('font').string.strip(),
> column[2].findNext('font').string.strip()
> dic[column[1].findNext('font').string.strip()]=
> column[2].findNext('font').string.strip()
>
> for key in dic.keys():
> print key, dic[key]
>
> The problem is I am missing the last name ABLE. How can I get "ALL"
> of the text. Clearly I have something wrong with my font string.. but
> what it is I am not sure of.
>
> Please and thanks!!
>
In the last row you have 3 <font> tags. The first one
contains LOO the second one is empty and the third one
contains ABLE.
<td width="693"><font size="2" face="Arial">LOO</font><b>
<font size="2" face="Arial"> </font></b>
<font size="2" face="Arial">ABLE</font></td>
Your code is not expecting the second (empty) tag.
-Larry Bates
More information about the Python-list
mailing list