[Tutor] Using urllib to retrieve info

Liam Clarke-Hutchinson Liam.Clarke-Hutchinson at business.govt.nz
Mon Aug 8 23:40:56 CEST 2005


Hi all, 

David, are you able to send us a screen shot of what you're trying to get? 
>From your desired link I just get a bunch of ads with another ad telling me
it's for sale. 
When I open http://support.mywork.co.uk it looks exactly the same as 
http://support.mywork.co.uk/index.php?node=2371&pagetree=&fromid=20397&objec
tid=21897

So yah, are there cookies, it looks like the site lost it's domain to me.

You may also find that urllib's User Agent setting causes some sites to flat
out spit the dummy.
Try google.com with urllib without changing the user agent, google gets all
huffy when rival bots attempt to spider it. 

I believe you can change the User-Agent using urllib, 
so you can easily masquerade as Internet Explorer or a Mozilla browser. 

>From http://docs.python.org/lib/module-urllib.html

"""
By default, the URLopener class sends a User-Agent: header of "urllib/VVV",
where VVV is the urllib version number. 
Applications can define their own User-Agent: header by subclassing
URLopener or FancyURLopener and 
setting the class attribute version to an appropriate string value in the
subclass definition.
"""

That's another caveat when using urllib. 

But yeah, digression aside, short answer is, I think your specified resource
is dead, and a url-camper has
taken all urls for that site and redirected them to a pop-up fest. 

Regards, 


Liam Clarke-Hutchinson

-----Original Message-----
From: tutor-bounces at python.org [mailto:tutor-bounces at python.org] On Behalf
Of Alan G
Sent: Tuesday, 9 August 2005 6:33 a.m.
To: David Holland; tutor python
Subject: Re: [Tutor] Using urllib to retrieve info


> It runs fine but the file saved to disk is the
> information at : 'http://support.mywork.co.uk'
> not 
> 'http://support.mywork.co.uk/index.php?node=2371&pagetree=&fromid=2039
> 7&objectid=21897"'

Could there be cookies involved?

Just a thought,

Alan G. 

_______________________________________________
Tutor maillist  -  Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor

A new monthly electronic newsletter covering all aspects of MED's work is now available.  Subscribers can choose to receive news from any or all of seven categories, free of charge: Growth and Innovation, Strategic Directions, Energy and Resources, Business News, ICT, Consumer Issues and Tourism.  See http://news.business.govt.nz for more details.




http://www.govt.nz - connecting you to New Zealand central & local government services

Any opinions expressed in this message are not necessarily those of the Ministry of Economic Development. This message and any files transmitted with it are confidential and solely for the use of the intended recipient. If you are not the intended recipient or the person responsible for delivery to the intended recipient, be advised that you have received this message in error and that any use is strictly prohibited. Please contact the sender and delete the message and any attachment from your computer.


More information about the Tutor mailing list