SV: How to save web pages for offline reading?

Carsten Gehling carsten at gehling.dk
Tue Jul 22 04:31:01 EDT 2003


> -----Oprindelig meddelelse-----
> Fra: python-list-admin at python.org
> [mailto:python-list-admin at python.org]På vegne af Will Stuyvesant
> Sendt: 22. juli 2003 08:38
> Til: python-list at python.org
> Emne: Re: How to save web pages for offline reading?

> There is no "man wget" on Windows :-)
> And unfortunately the GNU Windows port of wget I have (version
> 1-5-3-1) does not have that --page-requisites parameter.

Well since you ARE on Windows:

Open the page in Internet Explorer, choose "File" and "Save As...". Voilá!
You've now saved all necessary files.

> I thought this whole thing would be easy with all those Python
> internet modules in the standard distro: httplib, urllib, urllib2,
> FancyURLxxx etc.  Being able to download a "complete" page *from
> Python source* would be very nice for my particular application.

Well it's doable with those libraries, but you have to put your own meat on
the bones.

1) Use httplib to get the page first.
2) Parse it for all "src" attributes, and get the supporting files. The
parsin can be done with a html-parser library (can't remember the name) or
with regular expressions. REMEMBER: All paths must be corrected to paths
relative to the main document.

- Carsten






More information about the Python-list mailing list