Beautiful parse joy - Oh what fun

Larry Bates larry.bates at websafe.com
Tue May 16 17:38:16 EDT 2006


rh0dium wrote:
> Hi all,
> 
> I am trying to parse into a dictionary a table and I am having all
> kinds of fun.  Can someone please help me out.
> 
> What I want is this:
> 
> dic={'Division Code':'SALS','Employee':'LOO ABLE'}
> 
> Here is what I have..
> 
>     html="""<table> <tr valign="top"><td width="24"><img
> src="/icons/ecblank.gif" border="0" height="1" width="1" alt=""
> /></td><td width="129"><b><font size="2" face="Arial">Division Code:
> </font></b></td><td width="693"><font size="2"
> face="Arial">SALS</font></td></tr> <tr valign="top"><td width="24"><img
> src="/icons/ecblank.gif" border="0" height="1" width="1" alt="" /> <td
> width="129"><b><font size="2" face="Arial">Employee:
> </font></b></td> <td width="693"><font size="2"
> face="Arial">LOO</font><b><font size="2" face="Arial"> </font></b><font
> size="2" face="Arial">ABLE</font></td></tr></table> """
> 
> 
>     from BeautifulSoup import BeautifulSoup
>     soup = BeautifulSoup()
>     soup.feed(html)
> 
>     dic={}
>     for row in soup('table')[0]('tr'):
>         column = row('td')
>         print column[1].findNext('font').string.strip(),
> column[2].findNext('font').string.strip()
>         dic[column[1].findNext('font').string.strip()]=
> column[2].findNext('font').string.strip()
> 
>     for key in dic.keys():
>         print key,  dic[key]
> 
>  The problem is I am missing the last name ABLE.  How can I get "ALL"
> of the text.  Clearly I have something wrong with my font string..  but
> what it is I am not sure of.
> 
> Please and thanks!!
> 
In the last row you have 3 <font> tags.  The first one
contains LOO the second one is empty and the third one
contains ABLE.

<td width="693"><font size="2" face="Arial">LOO</font><b>
  <font size="2" face="Arial"> </font></b>
  <font size="2" face="Arial">ABLE</font></td>

Your code is not expecting the second (empty) tag.

-Larry Bates



More information about the Python-list mailing list