Urllib's urlopen and urlretrieve

Dave Angel davea at davea.name
Thu Feb 21 13:04:37 EST 2013


On 02/21/2013 12:47 PM, rh wrote:
> On Thu, 21 Feb 2013 10:56:15 -0500
> Dave Angel <davea at davea.name> wrote:
>> On 02/21/2013 07:12 AM, qoresucks at gmail.com wrote:
>>> I only just started Python and given that I know nothing about
>>> network programming or internet programming of any kind really, I
>>> thought it would be interesting to try write something that could
>>> create an archive of a website for myself.
>>
>
>
>> To archive your website, use the rsync command.  No need to write any
>> code, as rsync will descend into all the directories as needed, and
>> it'll get the actual website data, not the stuff that the web server
>> feeds to the browsers.
>
> How many websites let you suck down their content using rsync???
> The request was for creating their own copy of a website.
>

Clearly this was his own website, since it's usually unethical to "suck 
down" someone else's.  And my message specifically said "To archive 
*your* website..."  As to the implied question of why, since he 
presumably has the original sources, I can only relate my own 
experience.  I generate mine by a python program, but over time obsolete 
files are left behind.  Additionally, an overzealous SEO person 
hand-edited my files.  And finally, I reinstalled my system from scratch 
a couple of months ago.  So in order to see exactly what's out there, I 
used rsync, about two weeks ago.


-- 
DaveA



More information about the Python-list mailing list