saving a webpage's links to the hard disk

castironpi at gmail.com castironpi at gmail.com
Wed May 7 05:59:53 EDT 2008


On May 7, 1:40 am, Jetus <stevegi... at gmail.com> wrote:
> On May 4, 7:22 am, castiro... at gmail.com wrote:
>
>
>
>
>
> > On May 4, 12:33 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> > wrote:
>
> > > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi... at gmail.com> escribió:
>
> > > > Is there a good place to look to see where I can find some code that
> > > > will help me to save webpage's links to the local drive, after I have
> > > > used urllib2 to retrieve the page?
> > > > Many times I have to view these pages when I do not have access to the
> > > > internet.
>
> > > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget
>
> > > --
> > > Gabriel Genellina
>
> > A lot of the functionality is already present.
>
> > import urllib
> > urllib.urlretrieve( 'http://python.org/', 'main.htm' )
> > from htmllib import HTMLParser
> > from formatter import NullFormatter
> > parser= HTMLParser( NullFormatter( ) )
> > parser.feed( open( 'main.htm' ).read( ) )
> > import urlparse
> > for a in parser.anchorlist:
> >     print urlparse.urljoin( 'http://python.org/', a )
>
> > Output snipped:
>
> > ...http://python.org/psf/http://python.org/dev/http://python.org/links/h...
> > ...
>
> How can I modify or add to the above code, so that the file references
> are saved to specified local directories, AND the saved webpage makes
> reference to the new saved files in the respective directories?
> Thanks for your help in advance.- Hide quoted text -
>
> - Show quoted text -

You'd have to convert filenames in the loop to a file system path; try
writing as is with makedirs( ).  You'd have to replace contents in a
file for links, so your best might be prefixing them with localhost
and spawning a small bounce-router.



More information about the Python-list mailing list