[Python-Dev] PEP for RFE 46738 (first draft)

Skip Montanaro skip at pobox.com
Sun Jun 19 06:07:57 CEST 2005


    Simon> XML is simply not suitable for database appplications, real time
    Simon> data capture and game/entertainment applications.

I use XML-RPC as the communications protocol between an Apache web server
and a middleware piece that talks to a MySQL database.  The web server
contains a mixture of CGI scripts written in Python and two websites written
in Mason (Apache+mod_perl).  Performance is fine.  Give either of these a
try:

    http://www.mojam.com/
    http://www.musi-cal.com/

    Simon> I'm sure other people have noticed this... or am I alone on this
    Simon> issue? :-)

Probably not.  XML-RPC is commonly thought of as slow, and if you operate
with it in a completely naive fashion, I don't doubt that it can be.  It
doesn't have to be though.  Do you have to be intelligent about caching
frequently used information and returning results in reasonably sized
chunks?  Sure, but that's probably true of any database-backed application.

    Simon> Have a look at this contrived example:

    ...

    Simon> Which produces the output:

    >> pythonw -u "bench.py"
    Simon> Gherkin encode 0.120689361357 seconds
    Simon> Gherkin decode 0.395871262968 seconds
    Simon> XMLRPC encode 0.528666352847 seconds
    Simon> XMLRPC decode 9.01307819849 seconds

That's fine, so XML-RPC is slower than Gherkin.  I can't run the Gherkin
code, but my XML-RPC numbers are a bit different than yours:

    XMLRPC encode 0.65 seconds
    XMLRPC decode 2.61 seconds

That leads me to believe you're not using any sort of C XML decoder.  (I
mentioned sgmlop in my previous post.  I'm sure /F has some other
super-duper accelerator that's even faster.)

I'm not saying that XML-RPC is the fastest thing on Earth.  I'd be willing
to bet it's a lot more interoperable than Gherkin is though:

    http://xmlrpc.scripting.com/directory/1568/implementations

and probably will be for the forseeable future.

Also, as you indicated, your example was a bit contrived.  XML-RPC seems to
be fast enough for many real-world applications.  Here's a somewhat
less-contrived example from the above websites:

    >>> orca
    <ServerProxy for orca.mojam.com:5007/RPC2>
    >>> t = time.time() ; x = orca.search(('MusicEntry', {'city': 'Chicago, IL'}), 0, 200, 50, 10000) ; print time.time() - t
    1.28429102898
    >>> len(x[1])
    200
    >>> x[1][0]
    ['MusicEntry', {'venue': 'Jazz Showcase', 'address': '', 'price': '', 'keywords': ['.jz.1369'], 'event': '', 'city': 'Chicago', 'end': datetime.datetime(2005, 6, 19, 0, 0), 'zip': '', 'start': datetime.datetime(2005, 6, 14, 0, 0), 'state': 'IL', 'program': '', 'email': 'skip at mojam.com', 'info': '', 'update_time': datetime.datetime(2005, 6, 8, 0, 0), 'address1': '59 West Grand Avenue', 'address2': '', 'address3': '', 'venueid': 2630, 'key': 816875, 'submit_time': datetime.datetime(2005, 6, 8, 0, 0), 'active': 1, 'merchandise': '', 'tickets': '', 'name': '', 'addressid': 17056, 'performers': ['Cedar Walton Quartet'], 'country': '', 'venueurl': '', 'time': '', 'performerids': [174058]}]
    >>> orca.checkpoint()
    'okay'
    >>> t = time.time() ; x = orca.search(('MusicEntry', {'city': 'Chicago, IL'}), 0, 200, 50, 10000) ; print time.time() - t
    1.91681599617

orca is an xmlrpclib proxy to the aforementioned middleware component.
(These calls are being made to a production server.)  The search() call gets
the first 200 concert listings within a 50-mile radius of Chicago.  x[1][0]
is the first item returned.  All 200 returned items are of the same
complexity.  1.28 seconds certainly isn't Earth-shattering performance.
Even worse is the 1.92 seconds after the checkpoint (which flushes the
caches forcing the MySQL database to be queried for everything).  Returning
200 items is also contrived.  Real users can't ask for that many items at a
time through the web interface.  Cutting it down to 20 items (which is what
users get by default) shows a different story:

    >>> orca.checkpoint()
    'okay'
    >>> t = time.time() ; x = orca.search(('MusicEntry', {'city': 'Chicago, IL'}), 0, 20, 50, 10000) ; print time.time() - t
    0.29478096962
    >>> t = time.time() ; x = orca.search(('MusicEntry', {'city': 'Chicago, IL'}), 0, 20, 50, 10000) ; print time.time() - t
    0.0978591442108

The first query after a checkpoint is slow because we have to go to MySQL a
few times, but once everything's cached, things are fast.  The 0.1 second
time for the last call is going to be almost all XML-RPC overhead, because
all the data's already been cached.  I find that acceptable.  If you go to
the Mojam website and click "Chicago", the above query is pretty much what's
performed (several other queries are also run to get corollary info).

I still find it hard to believe that yet another serialization protocol is
necessary.  XML is certainly overkill for almost everything.  I'll be the
first to admit that.  (I have to struggle with it as a configuration file
format at work.)  However, it is certainly widely available.

Skip


More information about the Python-Dev mailing list