Python3 html scraper that supports javascript

zljubisic at gmail.com zljubisic at gmail.com
Mon May 2 11:33:51 EDT 2016



I tried to use the following code:

from bs4 import BeautifulSoup
from selenium import webdriver

PHANTOMJS_PATH = 'C:\\Users\\Zoran\\Downloads\\Obrisi\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe'

url = 'https://hrti.hrt.hr/#/video/show/2203605/trebizat-prica-o-jednoj-vodi-i-jednom-narodu-dokumentarni-film'

browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get(url)

soup = BeautifulSoup(browser.page_source, "html.parser")

x = soup.prettify()

print(x)


When I print x variable, I would expect to see something like this:
<video src="mediasource:https://hrti.hrt.hr/2e9e9c45-aa23-4d08-9055-cd2d7f2c4d58" id="vjs_video_3_html5_api" class="vjs-tech" preload="none"><source type="application/x-mpegURL" src="https://prd-hrt.spectar.tv/player/get_smil/id/2203605/video_id/2203605/token/Cny6ga5VEQSJ2uZaD2G8pg/token_expiration/1462043309/asset_type/Movie/playlist_template/nginx/channel_name/trebiat__pria_o_jednoj_vodi_i_jednom_narodu_dokumentarni_film/playlist.m3u8?foo=bar">
</video>

but I can't come to that point.

Regards.



More information about the Python-list mailing list