Re: Extract the “Matrix form” dataset from BCS website.

Thomas Passin list1 at tompassin.net
Thu Dec 22 12:34:05 EST 2022


On 12/22/2022 8:35 AM, hongy... at gmail.com wrote:
> I want to extract / scrape the “Matrix form” dataset from the BCS website [1], a.k.a., the data appeared in the 3rd column.
> 
> I tried with the following python code snippet, but still failed to figure out the trick:

Tell what you observed, and what you expected.  For example, does the 
data get downloaded?  Do you get error messages, and if so what are 
they?  Does the id variable contain anything at all? Etc.

> import requests
> from bs4 import BeautifulSoup
> import re
> 
> proxies = {
>      'http': 'socks5h://127.0.0.1:18888',
>      'https': 'socks5h://127.0.0.1:18888'
> }
> 
> requests.packages.urllib3.disable_warnings()
> r = requests.get('https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane', proxies=proxies, verify=False)
> soup = BeautifulSoup(r.content, features="lxml")
> 
> table = soup.find('table')
> id = table.find_all('id')
> 
> My python environment is as follows:
> 
> werner at X10DAi:~$ pyenv shell datasci
> (datasci) werner at X10DAi:~$ python --version
> Python 3.11.1
> 
> Any tips will be appreciated.
> 
> [1] https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane
> 
> Regards,
> Zhao



More information about the Python-list mailing list