download web pages that are updated by ajax

Jabba Laci jabba.laci at gmail.com
Tue Apr 12 17:55:33 EDT 2011


> I've heard you can drive a web browser using Selenium
> (http://code.google.com/p/selenium/ ), have it visit the webpage and
> run the JavaScript on it, and then grab the final result.

Hi,

Thanks for the info. I tried selenium, you can get the source with the
get_html_source() function but it returns the original HTML, not the
DOM.

For the problem, I found a _general solution_ at
http://simile.mit.edu/wiki/Crowbar . "Crowbar is a web scraping
environment based on the use of a server-side headless mozilla-based
browser.
Its purpose is to allow running javascript scrapers against a DOM..."

A _specific solution_ for my case is to use BioPython:

-------
from Bio import Entrez

id='CP002059.1'

Entrez.email = 'whatev at mail.com'
handle=Entrez.efetch(db='nucleotide',id=id,rettype='gb')
local_file=open(id,'w')
local_file.write(handle.read())
handle.close()
local_file.close()
-------

Laszlo



More information about the Python-list mailing list