How to pick content from html using beatifulsoup

Sheetal Singh sheetalsingh at shopzilla.com
Tue Jul 10 00:02:28 EDT 2012


Hi,

I am a newbie in python, I need to fetch names of side filters and save in csv [PFA screen shot].

Following is snippet from code:
  soup = BeautifulStoneSoup(html)
#                for e in soup.findAll('div'):
#                     for c in e.findAll('h3'):
#                        for d in c.findAll('li'):
#                            print'@@@@@@@', d.extract()
#

#                #select_pod=soup.findAll('div', {"class":"win aboutUs"})
#                #promeg= select_pod[0].findAll("p")[0]
#
#



#                for dv in soup.findAll('div', {"class":"attribution"}):
#                            ds = dv.findAll("<h3>")
#                            print ds



                select_pod = soup.findAll('div')
                print select_pod
                for j in select_pod:
                        if j is not None:
                            print j.findall('a')
                promeg = select_pod.findAll("<h3>")
                #print '--', promeg




                #hreflist = [ each.get('value') for each in soup.findAll('<h3>') ]


                for m in promeg :
                                if m:
                                        print 'Data values', m
                                        fd1.writerow([x[2], m, i[0], "Data Found"])


Structure of HTML:

<div class="attribution">
<div>
<h3>By Brand</h3>
<ul>
<li>
<a href="http://www.xyz.com/cellphones/nokia/nokia/259-33902/buy">Nokia</a>
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<h3>By Seller</h3>
<ul>
<li>
<a id="att_296935_184059" class="attributeUrlReplacementTarget" href="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy">Amazon Marketplace</a>
<input id="att_296935_184059_replacement" type="hidden" value="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy">
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<div>
</div>


Output required in csv:

By Brands
Nokia
Samsung
.
.

By Seller
Amazon
Buy.com
.
.
.



Please suggest how to fetch details.

Sheetal Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120710/f38bcc20/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: filters.png
Type: image/png
Size: 8445 bytes
Desc: filters.png
URL: <http://mail.python.org/pipermail/python-list/attachments/20120710/f38bcc20/attachment.png>


More information about the Python-list mailing list