Impersonating other broswers...

Eric Pederson whereU at now.com
Sat Mar 5 20:41:41 EST 2005


Skip Montanaro <skip at pobox.com> wrote

> It doesn't look any easier to do this using urllib2.  Seems like a
> semi-obvious oversight for both modules.  That suggests few people have 
> ever
> desired this capability.


my $.02:

I have trouble believing few people have not desired this for two reasons:

(1)  some web sites will shut out user agents they do not recognize to preserve bandwidth or for other reasons; the right User Agent ID can be required to get the data one wants;

(2)  It seems like it is a worthwhile courtesy to identify oneself when spidering or data scraping, and the User Agent ID seems like the obvious way to do that. I'd guess (and like to think) that Python users are generally a little more concerned with such courtesies than the user population of some other languages.

e.g.  Your website might get a hit from:  "Mozilla/5.0 (Songzilla MP3 Blog, http://songzilla.blogspot.com) Gecko/20041107 Firefox/1.0"

And you'll get to decide whether to shut them out or not, but at least it won't seem like the black hats are attacking.




Eric Pederson
http://www.songzilla.blogspot.com
:::::::::::::::::::::::::::::::::::
domainNot="@something.com"
domainIs=domainNot.replace("s","z")
ePrefix="".join([chr(ord(x)+1) for x in "do"])
mailMeAt=ePrefix+domainIs
:::::::::::::::::::::::::::::::::::




More information about the Python-list mailing list