How to save web pages for offline reading?

Anand Pillai pythonguy at Hotpop.com
Mon Jul 21 10:05:01 EDT 2003


Hi

I am the developer and maintainer of 'HarvestMan', a program which
does exactly what you want ( and more ). It is an offline webcrawler
written completely in Python. It is currently in 1.1 beta version.

 Dont bother to write code yourselves. HarvestMan supports HTTP/FTP/HTTPS
protocols and works across proxies and can crawl intranets. There is
documentation available for its use and I give free support, as much as
my time permits me :->

The freshmeat project page is here: 
http://www.freshmeat.net/projects/harvestman

Give it a try and let me know your experience.

Thanks

~Anand 

hwlgw at hotmail.com (Will Stuyvesant) wrote in message news:<cb035744.0307210131.5cfbc890 at posting.google.com>...
> I am trying to download pages from Python, for offline reading.  This
> to save telephone costs :-)
> 
> If a page contains something like
> <link rel='stylesheet' href='../pythonware.css' type='text/css' />
> and I use fp=urllib.urlopen(...) and then fp.read(), I get the HTML
> but not the CSS.  As a result the page looks bad when reading offline.
>  How to solve this?  Also the .GIF's in a page would be nice, but this
> is less important and also would take more time to download.




More information about the Python-list mailing list