[Tutor] beautifulsoup
cs at zip.com.au
cs at zip.com.au
Tue Oct 4 02:08:15 EDT 2016
On 04Oct2016 13:35, Crusier <crusier at gmail.com> wrote:
>I am trying to scrap from the (span class= 'Number'). The code looks
>like this on the pages I am scrapping:
>
> <div id="DetailMainBox">
> <table>
> <tr>
><td rowspan="2" class="styleA">
><span class="UP">99 </span><span class="Change">10.00 (-0.1%)</span>
><span class="Portfolio"><a href="../../members/index.php"
>class="ThemeColor" target="_blank">Menu<img src="../images/more.gif"
>width="11" height="11" border="0" align="absmiddle" /></a></span>
></td>
>
>
>
> <td class="styleB">Max Quantity<span class="RT"></span><br>
><span class="Number">100.000</span></span> </td>
> <td class="styleB">Average Quantity<span class="RT"></span><br />
><span class="Number">822</span></td>
>
><td class="styleB">Previous Order<br />
><span class="Number">96</span></td>
>
> <td class="styleB">Max Price<br />
><span class="Number">104</span></td>
>
> <td class="styleB">Number of Trades<br />
><span class="Number">383</span></td>
></tr>
>
> <tr>
><td class="styleB">Min Price<span class="RT"></span><br>
><span class="Number">59</span></td>
><td class="styleB">Total Amount<span class="RT"></span><br />
><span class="Number">800</span></td>
>
><td class="styleB">Start<br />
><span class="Number">10</span></td>
>
><td class="styleB">Low<br />
><span class="Number">98 </span></td>
>
>I have tried to use Beautifulsoup to scrape the data. However, it
>returns Nothing on the screen
>
> from bs4 import BeautifulSoup
>
> html = response.content
> soup = BeautifulSoup(html,"html.parser")
> title = soup.select('td.styleB')[0].next_sibling
> title1 = soup.find_all('span', attrs={'class': 'Number'}).next_sibling
> print(title1)
>
>I am hoping that I could retrieve the number as follows:
>
>Max Quantity: 100
>Average Quantity: 822
>Previous Order: 96
>Max Price: 104
>Number of Trades:383
>Min Price: 59
>Total Amount:800
>Start:10
>Low: 98
>
>Please advise what is the problem with my code from handling the
>query. Thank you
You perform several steps here before your print. Break them up. "soup.select",
"[0]", "next_sibling" etc and print the intermediate values along the way.
As a wide guess, might:
title = soup.select('td.styleB')[0].next_sibling
fetch this?
<span class="RT"></span>
I also suspect that next_sibling returns the next tags in the DOM tree. Not
text. Your title1 might come out better as:
title1 = str(soup.find_all('span', attrs={'class': 'Number'})[0])
if I recall how to grab the text inside a tag. Also, don't you want a loop
around your find_all?
Eg:
for tag in soup.find_all('span', attrs={'class': 'Number'}):
print(tag)
print(str(tag)) # or tag.text() ?
Anyway, put in more print()s in the middle of your traversal of the DOM. That
should show where things are going wrong.
Cheers,
Cameron Simpson <cs at zip.com.au>
More information about the Tutor
mailing list