Python3 html scraper that supports javascript

DFS nospam at dfs.com
Mon May 2 12:39:35 EDT 2016


On 5/2/2016 11:33 AM, zljubisic at gmail.com wrote:
>
>
> I tried to use the following code:
>
> from bs4 import BeautifulSoup
> from selenium import webdriver
>
> PHANTOMJS_PATH = 'C:\\Users\\Zoran\\Downloads\\Obrisi\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe'
>
> url = 'https://hrti.hrt.hr/#/video/show/2203605/trebizat-prica-o-jednoj-vodi-i-jednom-narodu-dokumentarni-film'
>
> browser = webdriver.PhantomJS(PHANTOMJS_PATH)
> browser.get(url)
>
> soup = BeautifulSoup(browser.page_source, "html.parser")
>
> x = soup.prettify()
>
> print(x)
>
>
> When I print x variable, I would expect to see something like this:
> <video src="mediasource:https://hrti.hrt.hr/2e9e9c45-aa23-4d08-9055-cd2d7f2c4d58" id="vjs_video_3_html5_api" class="vjs-tech" preload="none"><source type="application/x-mpegURL" src="https://prd-hrt.spectar.tv/player/get_smil/id/2203605/video_id/2203605/token/Cny6ga5VEQSJ2uZaD2G8pg/token_expiration/1462043309/asset_type/Movie/playlist_template/nginx/channel_name/trebiat__pria_o_jednoj_vodi_i_jednom_narodu_dokumentarni_film/playlist.m3u8?foo=bar">
> </video>
>
> but I can't come to that point.
>
> Regards.


I was doing something similar recently.  Try this:

f = open(somefilename)
soup = BeautifulSoup.BeautifulSoup(f)
f.close()
print soup.prettify()





More information about the Python-list mailing list