HTML page into a string

Tue Feb 7 23:32:39 EST 2006

"Tempo" <bradfordh at gmail.com> writes:

> In my last post I received some advice to use urllib.read() to get a
> whole html page as a string, which will then allow me to use
> BeautifulSoup to do what I want with the string. But when I was
> researching the 'urllib' module I couldn't find anything about its
> sub-section '.read()' ? Is that the right module to get a html page
> into a string? Or am I completely missing something here? I'll take
> this as the more likely of the two cases. Thanks for any and all help.

Here's a short example of how this all works:

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

response = urllib2.urlopen('http://www.cnn.com')
soup = BeautifulSoup(response)
print soup.prettify()

It's not a particularly useful example, unless, of course, you wish to
prettify cnn's html, but it should get you to the point where
BeautifulSoup's documentation starts to make sense.

Jason