Scraping a web page
Support Desk
support.desk.ipg at gmail.com
Tue Apr 7 09:58:54 EDT 2009
You could do something like below to get the rendered page.
Import os
site = 'website.com'
X = os.popen('lynx --dump %s' % site).readlines()
-----Original Message-----
From: Tim Chase [mailto:python.list at tim.thechases.com]
Sent: Tuesday, April 07, 2009 7:45 AM
To: Ronn Ross
Cc: python-list at python.org
Subject: Re: Scraping a web page
> f = urllib.urlopen("http://www.google.com")
> s = f.read()
>
> It is working, but it's returning the source of the page. Is there anyway
I
> can get almost a screen capture of the page?
This is the job of a browser -- to render the source HTML. As
such, you'd want to look into any of the browser-automation
libraries to hook into IE, FireFox, Opera, or maybe using the
WebKit/KHTML control. You may then be able to direct it to
render the HTML into a canvas you can then treat as an image.
Another alternative might be provided by some web-services that
will render a page as HTML with various browsers and then send
you the result. However, these are usually either (1)
asynchronous or (2) paid services (or both).
-tkc
More information about the Python-list
mailing list