saving a webpage's links to the hard disk

Diez B. Roggisch deets at nospam.web.de
Wed May 7 09:36:15 EDT 2008


Jetus wrote:

> On May 4, 7:22 am, castiro... at gmail.com wrote:
>> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
>> wrote:
>>
>> > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi... at gmail.com>
>> > escribió:
>>
>> > > Is there a good place to look to see where I can find some code that
>> > > will help me to save webpage's links to the local drive, after I have
>> > > used urllib2 to retrieve the page?
>> > > Many times I have to view these pages when I do not have access to
>> > > the internet.
>>
>> > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget
>>
>> > --
>> > Gabriel Genellina
>>
>> A lot of the functionality is already present.
>>
>> import urllib
>> urllib.urlretrieve( 'http://python.org/', 'main.htm' )
>> from htmllib import HTMLParser
>> from formatter import NullFormatter
>> parser= HTMLParser( NullFormatter( ) )
>> parser.feed( open( 'main.htm' ).read( ) )
>> import urlparse
>> for a in parser.anchorlist:
>>     print urlparse.urljoin( 'http://python.org/', a )
>>
>> Output snipped:
>>
>> ...http://python.org/psf/http://python.org/dev/http://python.org/links/http://python.org/download/releases/2.5.2http://docs.python.org/http://python.org/ftp/python/2.5.2/python-2.5.2.msi
>> ...
> 
> How can I modify or add to the above code, so that the file references
> are saved to specified local directories, AND the saved webpage makes
> reference to the new saved files in the respective directories?
> Thanks for your help in advance.

how about you *try* to do so - and if you have actual problems, you come
back and ask for help? Alternatively, there's always guru.com

Diez



More information about the Python-list mailing list