saving a webpage's links to the hard disk
castironpi at gmail.com
castironpi at gmail.com
Sun May 4 07:22:31 EDT 2008
On May 4, 12:33 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
wrote:
> En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi... at gmail.com> escribió:
>
> > Is there a good place to look to see where I can find some code that
> > will help me to save webpage's links to the local drive, after I have
> > used urllib2 to retrieve the page?
> > Many times I have to view these pages when I do not have access to the
> > internet.
>
> Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget
>
> --
> Gabriel Genellina
A lot of the functionality is already present.
import urllib
urllib.urlretrieve( 'http://python.org/', 'main.htm' )
from htmllib import HTMLParser
from formatter import NullFormatter
parser= HTMLParser( NullFormatter( ) )
parser.feed( open( 'main.htm' ).read( ) )
import urlparse
for a in parser.anchorlist:
print urlparse.urljoin( 'http://python.org/', a )
Output snipped:
...
http://python.org/psf/
http://python.org/dev/
http://python.org/links/
http://python.org/download/releases/2.5.2
http://docs.python.org/
http://python.org/ftp/python/2.5.2/python-2.5.2.msi
...
More information about the Python-list
mailing list