[Tutor] Unable to retreive the stock code

Mon Nov 16 04:17:24 EST 2015

Crusier wrote:

> Dear All,
> 
> I am currently trying to download the stock code. I am using Python
> 3.4 and the code is as follows:
> 
> from bs4 import BeautifulSoup
> import requests
> import re
> 
> url =
> 'https://www.hkex.com.hk/eng/market/sec_tradinfo/stockcode/eisdeqty.htm'
> 
> def web_scraper(url):
>     response = requests.get(url)
>     html = response.content
>     soup = BeautifulSoup(html,"html.parser")
>     for link in soup.find_all("a"):
>         stock_code = re.search('/d/d/d/d/d', "00001" )
>         print(stock_code, '', link.text)
>         print(link.text)
> 
> web_scraper(url)
> 
> I am trying to retrieve the stock code from here:
> <td class="verd_black12" width="18%">00001</td>
> 
> or from a href.
> 
> Please kindly inform which library I should use.

The good news is that you don't need regular expressions here, just 
beautiful soup is sufficient.

Have a look at the html source of eisdeqty.html in a text editor, 
and then use the interactive interpreter to get closer to the desired result:

Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> from bs4 import BeautifulSoup
>>> url = 'https://www.hkex.com.hk/eng/market/sec_tradinfo/stockcode/eisdeqty.htm'
>>> soup = BeautifulSoup(requests.get(url).content)

[snip some intermediate attempts]

>>> soup.html.body.table.table.table.table.tr.td
<td class="verd_black12" width="18%"><b>STOCK CODE</b></td>
>>> stock_codes = [tr.td.text for tr in soup.html.body.table.table.table.table.find_all("tr")]
>>> stock_codes[:10]
['STOCK CODE', '00001', '00002', '00003', '00004', '00005', '00006', '00007', '00008', '00009']
>>> stock_codes[-10:]
['06882', '06886', '06888', '06889', '06893', '06896', '06898', '06899', '80737', '84602']