[Tutor] beautifulsoup

Tue Oct 4 02:08:15 EDT 2016

On 04Oct2016 13:35, Crusier <crusier at gmail.com> wrote:
>I am trying to scrap from the (span class= 'Number'). The code looks
>like this on the pages I am scrapping:
>
>    <div id="DetailMainBox">
>    <table>
>    <tr>
><td rowspan="2" class="styleA">
><span class="UP">99 </span><span class="Change">10.00    (-0.1%)</span>
><span class="Portfolio"><a href="../../members/index.php"
>class="ThemeColor" target="_blank">Menu<img src="../images/more.gif"
>width="11" height="11" border="0" align="absmiddle" /></a></span>
></td>
>
>
>
>    <td class="styleB">Max Quantity<span class="RT"></span><br>
><span class="Number">100.000</span></span> </td>
>    <td class="styleB">Average Quantity<span class="RT"></span><br />
><span class="Number">822</span></td>
>
><td class="styleB">Previous Order<br />
><span class="Number">96</span></td>
>
>    <td class="styleB">Max Price<br />
><span class="Number">104</span></td>
>
>    <td class="styleB">Number of Trades<br />
><span class="Number">383</span></td>
></tr>
>
>    <tr>
><td class="styleB">Min Price<span class="RT"></span><br>
><span class="Number">59</span></td>
><td class="styleB">Total Amount<span class="RT"></span><br />
><span class="Number">800</span></td>
>
><td class="styleB">Start<br />
><span class="Number">10</span></td>
>
><td class="styleB">Low<br />
><span class="Number">98 </span></td>
>
>I have tried to use Beautifulsoup to scrape the data. However, it
>returns Nothing on the screen
>
>     from bs4 import BeautifulSoup
>
>     html = response.content
>     soup = BeautifulSoup(html,"html.parser")
>     title =  soup.select('td.styleB')[0].next_sibling
>     title1 = soup.find_all('span', attrs={'class': 'Number'}).next_sibling
>     print(title1)
>
>I am hoping that I could retrieve the number as follows:
>
>Max Quantity: 100
>Average Quantity: 822
>Previous Order: 96
>Max Price: 104
>Number of Trades:383
>Min Price: 59
>Total Amount:800
>Start:10
>Low: 98
>
>Please advise what is the problem with my code from handling the
>query. Thank you

You perform several steps here before your print. Break them up. "soup.select", 
"[0]", "next_sibling" etc and print the intermediate values along the way.

As a wide guess, might:

  title =  soup.select('td.styleB')[0].next_sibling

fetch this?

  <span class="RT"></span>

I also suspect that next_sibling returns the next tags in the DOM tree. Not 
text. Your title1 might come out better as:

  title1 = str(soup.find_all('span', attrs={'class': 'Number'})[0])

if I recall how to grab the text inside a tag. Also, don't you want a loop 
around your find_all?

Eg:

    for tag in soup.find_all('span', attrs={'class': 'Number'}):
      print(tag)
      print(str(tag))   # or tag.text() ?

Anyway, put in more print()s in the middle of your traversal of the DOM. That 
should show where things are going wrong.

Cheers,
Cameron Simpson <cs at zip.com.au>