[Catalog-sig] threads and xmlrpc?

Lukasz Szybalski szybalski at gmail.com
Sat Jan 31 06:46:28 CET 2009


On Thu, Jan 29, 2009 at 11:44 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> I'm running a threaded app using some calls via xmlrpc to pypi. What
>> I'm trying to get is a to get a littler more responses in a shorter
>> time, as I see that the bandwidth used by xmlrpc calls are minimal
>> (<kb). The problem I run into is that connection is reset by peer
>> after about 10min (~500 calls). I use a single connection and a queue
>> of 8 threads to get the data. Would anybody have an example on how to
>> run xmlrpc in a thread? Do I set multiple connections, or is there a
>> setting to keep the connection live or reconnect if disconnected?
>
> Using threads will not at all make it faster to communicate over a
> single connection. For a single connection, all communication must
> be serialized; you cannot issue a new request until the previous
> request has completed. So you might as well just issue the requests
> from a single thread.
>
>> Also, please advice if you think that somehow I am overloading your
>> servers. I've tasted some downloads speeds and I am sure you web
>> browser can accept 100+ requests per second, but what about xmlrpc?
>> Without threads I get <5 requests per second.
>
> I think 5 requests per second is fairly fast.
>
Its more like 2 requests per second.

If I set it to 2 threads I can list each package version in about an
hour, but I lost connection when I was at a z packages.

If I used 5-8 I can get half way in about 25min but I lose connection.
("Connection reset by peer")

Would you know how can I issue more requests, and/or increase the
number of connections?

I know "http://www.faqs.org/rfcs/rfc2068.html See section 8.1.4.
The RFC says "should limit 2 connections per server" and a lot of http
client libraries obey this."

Does xmlrpc lib used by pypi does the same?

Does pypi use http://docs.python.org/library/xmlrpclib.html#multicall-objects

This is my last try. I was hoping that I can increase the number of
connections to at least 10/second ~20min but I can't seem to find any
performance increases on xmlrpc.

Is there another way to get:
pypi.list_packages()
pypi.package_releases('xyz')
pypi.release_data(' xyz' ,' 0.7.79dev' )

If not then I guess I will go back to the regular for loop and loop
through all the records in a serialized manner. (Its been 1h 15min and
I am on packages starting with letter R.)

Cpickle file coming soon for the metadata available in release_data
for all packages.

Thanks,
Lucas


More information about the Catalog-SIG mailing list