Beautifulsoup html parsing - nested tags

Selvam s.selvamsiva at gmail.com
Wed Jan 5 05:15:12 EST 2011


On Wed, Jan 5, 2011 at 2:58 PM, Selvam <s.selvamsiva at gmail.com> wrote:

> Hi all,
>
> I am trying to parse some html string with BeatifulSoup.
>
> The string is,
>
>     <table colWidths='530.0' style='Table_Main_Table'>
>       <tr>
>         <td>
>           <blockTable colWidths='54.0,80.0,67.0' style='Table_Tax_Header'>
>             <tr>
>               <th>
>               <p style='terp_tblheader_Details_Centre'>Tax</p></th>
>               <th>
>               <p style='terp_tblheader_Details_Right'>Base</p></th>
>               <th>
>             <p style='terp_tblheader_Details_Right'>Amount</p></th>
>             </tr>
>           </blockTable>
>         </td>
>       </tr>
>    </table>
>
>
> rtables=soup.findAll(re.compile('table$'))
>
> The rtables is,
>
> [<table colwidths="530.0" style="Table_Main_Table">
> <tr>
> <td>
> <blocktable colwidths="54.0,80.0,67.0" style="Table_Tax_Header">
> </blocktable></td></tr><tr>
> <th>
> <p style="terp_tblheader_Details_Centre">Tax</p></th>
> <th>
> <p style="terp_tblheader_Details_Right">Base</p></th>
> <th>
> <p style="terp_tblheader_Details_Right">Amount</p></th>
> </tr>
> </table>, <blocktable colwidths="54.0,80.0,67.0" style="Table_Tax_Header">
> </blocktable>]
>
>
>
> The tr inside the blocktable are appearing inside the table, while
> blocktable contains nothing.
>
> Is there any way, I can get the tr in the right place (inside blocktable) ?
>
> --
> Regards,
> S.Selvam
> SG E-ndicus Infotech Pvt Ltd.
> http://e-ndicus.com/
>
>  " I am because we are "
>

Replying to myself,

BeautifulSoup.BeautifulSoup.NESTABLE_TABLE_TAGS['tr'].append('blocktable')

adding this, solved the issue.

-- 
Regards,
S.Selvam
SG E-ndicus Infotech Pvt Ltd.
http://e-ndicus.com/

 " I am because we are "
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110105/080f533a/attachment-0001.html>


More information about the Python-list mailing list