Beautifulsoup html parsing - nested tags

Selvam s.selvamsiva at gmail.com
Wed Jan 5 04:28:31 EST 2011


Hi all,

I am trying to parse some html string with BeatifulSoup.

The string is,

    <table colWidths='530.0' style='Table_Main_Table'>
      <tr>
        <td>
          <blockTable colWidths='54.0,80.0,67.0' style='Table_Tax_Header'>
            <tr>
              <th>
              <p style='terp_tblheader_Details_Centre'>Tax</p></th>
              <th>
              <p style='terp_tblheader_Details_Right'>Base</p></th>
              <th>
            <p style='terp_tblheader_Details_Right'>Amount</p></th>
            </tr>
          </blockTable>
        </td>
      </tr>
   </table>


rtables=soup.findAll(re.compile('table$'))

The rtables is,

[<table colwidths="530.0" style="Table_Main_Table">
<tr>
<td>
<blocktable colwidths="54.0,80.0,67.0" style="Table_Tax_Header">
</blocktable></td></tr><tr>
<th>
<p style="terp_tblheader_Details_Centre">Tax</p></th>
<th>
<p style="terp_tblheader_Details_Right">Base</p></th>
<th>
<p style="terp_tblheader_Details_Right">Amount</p></th>
</tr>
</table>, <blocktable colwidths="54.0,80.0,67.0" style="Table_Tax_Header">
</blocktable>]



The tr inside the blocktable are appearing inside the table, while
blocktable contains nothing.

Is there any way, I can get the tr in the right place (inside blocktable) ?

-- 
Regards,
S.Selvam
SG E-ndicus Infotech Pvt Ltd.
http://e-ndicus.com/

 " I am because we are "
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110105/3de0de6b/attachment.html>


More information about the Python-list mailing list