Problem when scraping the 100 Movie titles.

Mats Wichmann mats at wichmann.us
Thu Feb 18 13:21:52 EST 2021


On 2/18/21 10:43 AM, Aakash Jana wrote:
> I have done some webscraping before i think you need to get a slightly more
> tactical way to get these titles scraped .
> Try to see what classes identify the cards (in which movie title is given)
> and then try to pull the heading out of those.
> Try to get the divs in a list , something like this "<div
> class="jsx-2692754980 listicle-item-image ">" in my case and then try to
> pull
> the h3 tag out of it . Onething to note is react os single page heavy
> webapps have seemed to be difficult to scrape maybe beautiful
> isnt made for JSX .
> 
> On Thu, Feb 18, 2021 at 9:09 PM Bischoop <Bischoop at vimart.net> wrote:
> 
>>
>> I'm learning Scraping actually and would like to scrape the movie titles
>> from https://www.empireonline.com/movies/features/best-movies-2 .
>> In the course I was learning I was supposed to do it with bs4:

Just in general, most websites don't want you to scrape them, and some 
go to considerable efforts to make it difficult, and some explicitly 
disallow downloading any content except for caching purposes.  If the 
website provides an API, that's how they expect you go consume data that 
isn't render through a web browser.

Just sayin' ...  there's no reason not to learn the concepts of web 
scraping but should ALSO be aware of terms of use.



More information about the Python-list mailing list