saving a webpage's links to the hard disk

castironpi at gmail.com castironpi at gmail.com
Wed May 7 20:41:17 EDT 2008


On May 7, 8:36 am, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
> Jetus wrote:
> > On May 4, 7:22 am, castiro... at gmail.com wrote:
> >> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> >> wrote:
>
> >> > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi... at gmail.com>
> >> > escribió:
>
> >> > > Is there a good place to look to see where I can find some code that
> >> > > will help me to save webpage's links to the local drive, after I have
> >> > > used urllib2 to retrieve the page?
> >> > > Many times I have to view these pages when I do not have access to
> >> > > the internet.
>
> >> > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget
>
> >> > --
> >> > Gabriel Genellina
>
> >> A lot of the functionality is already present.
>
> >> import urllib
> >> urllib.urlretrieve( 'http://python.org/', 'main.htm' )
> >> from htmllib import HTMLParser
> >> from formatter import NullFormatter
> >> parser= HTMLParser( NullFormatter( ) )
> >> parser.feed( open( 'main.htm' ).read( ) )
> >> import urlparse
> >> for a in parser.anchorlist:
> >>     print urlparse.urljoin( 'http://python.org/', a )
>
> >> Output snipped:
>
> >> ...http://python.org/psf/http://python.org/dev/http://python.org/links/h...
> >> ...
>
> > How can I modify or add to the above code, so that the file references
> > are saved to specified local directories, AND the saved webpage makes
> > reference to the new saved files in the respective directories?
> > Thanks for your help in advance.
>
> how about you *try* to do so - and if you have actual problems, you come
> back and ask for help? Alternatively, there's always guru.com
>
> Diez- Hide quoted text -
>
> - Show quoted text -

I've tried, no avail.  How does the open-source plug to Python look/
work?  Firefox was able to spawn Python in a toolbar in a distant
land.  Does it still?  I believe under DOM, return a file named X that
contains a list of changes to make to the page, or put it at the top
of one, to be removed by Firefox.  At that point, X would pretty much
be the last lexicly-sorted file in a pre-established directory.  Files
are really easy to create and add syntax too, if you create a bunch of
them.  Sector size was bouncing though, which brings that all the way
up to file system.

for( int docID= 0; docID++ ) {
  if ( doc.links[ docID ]== pythonfileA.links[ pyID ] ) {
    doc.links[ docID ].anchor= pythonfileB.links[ pyID ];
    pyID++;
  }
}



More information about the Python-list mailing list