saving a webpage's links to the hard disk

castironpi at gmail.com castironpi at gmail.com
Sun May 4 07:22:31 EDT 2008


On May 4, 12:33 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
wrote:
> En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi... at gmail.com> escribió:
>
> > Is there a good place to look to see where I can find some code that
> > will help me to save webpage's links to the local drive, after I have
> > used urllib2 to retrieve the page?
> > Many times I have to view these pages when I do not have access to the
> > internet.
>
> Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget
>
> --
> Gabriel Genellina

A lot of the functionality is already present.

import urllib
urllib.urlretrieve( 'http://python.org/', 'main.htm' )
from htmllib import HTMLParser
from formatter import NullFormatter
parser= HTMLParser( NullFormatter( ) )
parser.feed( open( 'main.htm' ).read( ) )
import urlparse
for a in parser.anchorlist:
    print urlparse.urljoin( 'http://python.org/', a )

Output snipped:

...
http://python.org/psf/
http://python.org/dev/
http://python.org/links/
http://python.org/download/releases/2.5.2
http://docs.python.org/
http://python.org/ftp/python/2.5.2/python-2.5.2.msi
...



More information about the Python-list mailing list