Bind threads to addresses -- Windows & urllib?

Nick Arnett narnett at mccmedia.com
Wed Sep 4 19:27:38 EDT 2002


--
Nick Arnett
Phone/fax: (408) 904-7198
narnett at mccmedia.com

> That's not gonna help you at all, because all threads will
> be capped by the interface's top throughput, or by your
> machine's processing power.

You're making an incorrect assumption about the purpose of using multiple
addresses.  It has nothing to do with my end of the connection; it is to
cope with servers that regard a reasonably well-behaved spider (in my
opinion, at least) as a denial-of-service attack.  If the server operators
would reveal what they regard as well-behaved, I wouldn't have to resort to
this, but as one might expect, nobody wants to disclose the parameters of
their DOS defenses.  I can't get the relevant sites to even respond to
inquries... and they don't even have a robots.txt file.

On the other hand, the more I think about this, the less interested I am in
bothering, since it would surely be easy for a server to block a range of
addresses.

I'm also working the other obvious solution, heuristics for the spider to
set its own speed so that it won't trigger  defenses -- but it appears
there's something more than simple rules at the other end.  The robot wars I
expected years ago have arrived...

No lectures on what consists good robot behavior, please -- I've been
operating *the* list on that subject for years
(http://www.mccmedia.com/mailman/listinfo/robots)

Nick





More information about the Python-list mailing list