HTML page into a string

Jason Earl jearl at xmission.com
Tue Feb 7 23:32:39 EST 2006


"Tempo" <bradfordh at gmail.com> writes:

> In my last post I received some advice to use urllib.read() to get a
> whole html page as a string, which will then allow me to use
> BeautifulSoup to do what I want with the string. But when I was
> researching the 'urllib' module I couldn't find anything about its
> sub-section '.read()' ? Is that the right module to get a html page
> into a string? Or am I completely missing something here? I'll take
> this as the more likely of the two cases. Thanks for any and all help.


Here's a short example of how this all works:

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

response = urllib2.urlopen('http://www.cnn.com')
soup = BeautifulSoup(response)
print soup.prettify()

It's not a particularly useful example, unless, of course, you wish to
prettify cnn's html, but it should get you to the point where
BeautifulSoup's documentation starts to make sense.

Jason



More information about the Python-list mailing list