[Chicago] threads and xmlrpc?

Sat Jan 31 06:51:23 CET 2009

On Fri, Jan 30, 2009 at 2:13 PM, Tim Gebhardt <tim at gebhardtcomputing.com> wrote:
>
> On Fri, Jan 30, 2009 at 1:19 PM, Lukasz Szybalski <szybalski at gmail.com>
> wrote:
>>
>> I see.
>> Looking at this example on threads:
>>
>> http://www.ibm.com/developerworks/aix/library/au-threadingpython/index.html
>>
>> this is implemented withing the thread.each thread calls....:
>> url = urllib2.urlopen(host)
>>
>> Looking at the xmlrpc examples the way I connect is:
>> http://docs.python.org/library/xmlrpclib.html
>> pypi=xmlrpclib.ServerProxy(XML_RPC_SERVER)
>>
>> My question is: is urlopen(xyz) similar to serverproxy(xyz) ? If yes
>> then I can use it within the thread and issue that in each thread. But
>> if its not and this will make at the end 5000 active connections then
>> that won't work .
>>
>> How would I know if serverproxy returns a instance of the class vs
>> request object?
>>
>> Thanks a lot,
>> Lucas
>
> I looked back on your original email in your original problem and have you
> tried doing what you're doing without threads?  Is there some particular
> reason why you need 8 threads?  Have you tried with just 1, then 2, then 3,
> etc.?
> I'm asking because if PyPI has Keep-Alive enabled on their webserver you'll
> almost certainly get the peak performance with a single thread hitting their
> single endpoint.  The error you indicated (connection reset by peer)
> indicates that the connection was purposely (either deliberately or
> inadvertently) terminated.
> The webserver or a proxy or a firewall on their end may be set up to reject
> too many connection attempts in a certain time window.  Or they could have
> Keep-Alive enabled and the connection is open for too long so they're
> severing it.
> Like I mentioned before in my last email I used to screen scrape a lot of
> stuff with some Python scripts.  I used to screen scrape financial news
> sites in hopes of one day turning that into an automated trading system.

How did that go? Do you still get the data? Do you have the whole set available?

I've google searched some more and I found out that xmlrpc has some
performance issues. Too bad, it seemed as really easy way to
communicate with other servers. I wonder if there is a performance
benchmark for C/C++ based xmlrpc vs xmlrpclib from python.

FYI. Its been 1.30 I am downloading a version number for packages
starting at letter t. When done I want to download metadata. That
probably will take at least 6h+.

Thanks,
Lucas

  I
> very aggressively scraped those sites and I would get a lot of errors as
> well, stuff like connection reset by peer.  Eventually I tuned down my
> aggressiveness and the errors pretty much went away.
> If you're not paying for the information or don't have an SLA with PyPI then
> there's no obligation to serve you that information in a timely or reliable
> manner.  In that case you may want to try delaying the requests and only
> using a single thread.  See if the errors go away.  The error you're getting
> doesn't indicate that there's something wrong on your end, it indicates
> something on PyPI's end or your ISP transmission of your data.
> -Tim Gebhardt
> tim at gebhardtcomputing.com
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
>

-- 
How to create python package?
http://lucasmanual.com/mywiki/PythonPaste
Bazaar and Launchpad
http://lucasmanual.com/mywiki/Bazaar