saving a webpage's links to the hard disk

Jetus stevegill7 at gmail.com
Wed May 7 02:40:15 EDT 2008


On May 4, 7:22 am, castiro... at gmail.com wrote:
> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:
>
> > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi... at gmail.com> escribió:
>
> > > Is there a good place to look to see where I can find some code that
> > > will help me to save webpage's links to the local drive, after I have
> > > used urllib2 to retrieve the page?
> > > Many times I have to view these pages when I do not have access to the
> > > internet.
>
> > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget
>
> > --
> > Gabriel Genellina
>
> A lot of the functionality is already present.
>
> import urllib
> urllib.urlretrieve( 'http://python.org/', 'main.htm' )
> from htmllib import HTMLParser
> from formatter import NullFormatter
> parser= HTMLParser( NullFormatter( ) )
> parser.feed( open( 'main.htm' ).read( ) )
> import urlparse
> for a in parser.anchorlist:
>     print urlparse.urljoin( 'http://python.org/', a )
>
> Output snipped:
>
> ...http://python.org/psf/http://python.org/dev/http://python.org/links/http://python.org/download/releases/2.5.2http://docs.python.org/http://python.org/ftp/python/2.5.2/python-2.5.2.msi
> ...

How can I modify or add to the above code, so that the file references
are saved to specified local directories, AND the saved webpage makes
reference to the new saved files in the respective directories?
Thanks for your help in advance.



More information about the Python-list mailing list