Problem when scraping the 100 Movie titles.

Aakash Jana aakashjana2002 at gmail.com
Thu Feb 18 12:43:02 EST 2021


I have done some webscraping before i think you need to get a slightly more
tactical way to get these titles scraped .
Try to see what classes identify the cards (in which movie title is given)
and then try to pull the heading out of those.
Try to get the divs in a list , something like this "<div
class="jsx-2692754980 listicle-item-image ">" in my case and then try to
pull
the h3 tag out of it . Onething to note is react os single page heavy
webapps have seemed to be difficult to scrape maybe beautiful
isnt made for JSX .

On Thu, Feb 18, 2021 at 9:09 PM Bischoop <Bischoop at vimart.net> wrote:

>
> I'm learning Scraping actually and would like to scrape the movie titles
> from https://www.empireonline.com/movies/features/best-movies-2 .
> In the course I was learning I was supposed to do it with bs4:
> titles = soup.find_all(name = 'h3', class_ = 'title')
>
> but after after a while I guess the site has changed and now the class
> is: jsx-2692754980
>
> <h3 class="jsx-2692754980">100) Stand By Me</h3>
>
> but anyway if I do try get those titles by name and class, my list is
> empty:
> titles = soup.find_all(name = 'h3', class_ = 'jsx-2692754980')
>
> I tried also selenium and manage get those titles with:
> driver.get('https://www.empireonline.com/movies/features/best-movies-2')
>
> #driver.find_element_by_xpath('/html/body/div/div[3]/div[5]/button[2]').click()
>
> titles = driver.find_elements_by_css_selector("h3.jsx-2692754980")
>
> tit=[]
> for e in titles:
>     tit.append(e.text)
>
>     print(tit)
>
> But in Chrome I get a popup asking to accept cookies and I need to
> click to accept them.
>
> Is someone here who knows how can I get those titles with BeautifulSoup
> and how to deal with
> cookies if using Selenium?
>
> --
> Thanks
> --
> https://mail.python.org/mailman/listinfo/python-list
>


More information about the Python-list mailing list