python proxy checker ,change to threaded version

r0g aioe.org at technicalbloke.com
Mon Dec 7 03:09:03 EST 2009


elca wrote:
> Hello ALL,
> 
> i have some python proxy checker .
> 
> and to speed up check, i was decided change to mutlthreaded version,
> 
> and thread module is first for me, i was tried several times to convert to
> thread version
>  
> and look for many info, but it not so much easy for novice python programmar
> .
> 
> if anyone can help me really much appreciate!! 
> 
> thanks in advance!
> 
> 
>     import urllib2, socket
>     
>     socket.setdefaulttimeout(180)
>     # read the list of proxy IPs in proxyList
>     proxyList = open('listproxy.txt').read()
>     
>     def is_bad_proxy(pip):    
>         try:        
>             proxy_handler = urllib2.ProxyHandler({'http': pip})        
>             opener = urllib2.build_opener(proxy_handler)
>             opener.addheaders = [('User-agent', 'Mozilla/5.0')]
>             urllib2.install_opener(opener)        
>             req=urllib2.Request('http://www.yahoo.com')  # <---check whether
> proxy alive 
>             sock=urllib2.urlopen(req)
>         except urllib2.HTTPError, e:        
>             print 'Error code: ', e.code
>             return e.code
>         except Exception, detail:
>     
>             print "ERROR:", detail
>             return 1
>         return 0
>     
>     
>     for item in proxyList:
>         if is_bad_proxy(item):
>             print "Bad Proxy", item
>         else:
>             print item, "is working"



The trick to threads is to create a subclass of threading.Thread, define
the 'run' function and call the 'start()' method. I find threading quite
generally useful so I created this simple generic function for running
things in threads...


def run_in_thread( func, func_args=[], callback=None, callback_args=[] ):
    import threading
    class MyThread ( threading.Thread ):
       def run ( self ):

            # Call function
            if function_args:
                result = function(*function_args)
            else:
                result = function()

            # Call callback
            if callback:
                if callback_args:
                    callback(result, *callback_args)
                else:
                    callback(result)

    MyThread().start()


You need to pass it a test function (+args) and, if you want to get a
result back from each thread you also need to provide a callback
function (+args). The first parameter of the callback function receives
the result of the test function so your callback would loo something
like this...

def cb( result, item ):
    if result:
        print "Bad Proxy", item
    else:
        print item, "is working"


And your calling loop would be something like this...

for item in proxyList:
    run_in_thread( is_bad_proxy, func_args=[ item ], cb, callback_args=[
item ] )


Also, you might want to limit the number of concurrent threads so as not
to overload your system, one quick and dirty way to do this is...

import time
if threading.activeCount() > 9: time.sleep(1)

Note, this is a far from exact method but it works well enough for one
off scripting use

Hope this helps.


Suggestions from hardcore pythonistas on how to my make run_in_thread
function more elegant are quite welcome also :)


Roger Heathcote



More information about the Python-list mailing list