[Python-ideas] An alternate approach to async IO

Wed Nov 28 13:49:12 CET 2012

On Tue, Nov 27, 2012 at 09:20:30PM -0800, Greg Ewing wrote:
> Trent Nelson wrote:
> >     When you're dealing with that sort of load
> >     on a many-core machine (let's say 16+ cores), an interlocked list
> >     is going to reduce latency versus 16+ threads constantly vying for
> >     the GIL.
> 
> I don't understand. Why is vying for access to an interlocked
> list any less latentful than vying for the GIL?

    I think not having to contend with the interpreter would make a big
    difference under load.  A push to an interlocked list will be more
    performant than having all threads attempt to do GIL acquire ->
    PyList_Append() -> GIL release.  The key to getting high performance
    (either low latency or high throughput) with the background IO stuff
    is ensuring the threads complete their work as quickly as possible.

    The quicker they process an event, the quicker they can process
    another event, the higher the overall throughput and the lower the
    overall latency.  Doing a GIL acquire/PyList_Append()/GIL release
    at the end of the event would add a huge overhead that simply would
    not be present with an interlocked push.

    Also, as soon as you call back into CPython from the background
    thread, your cache footprint explodes, which isn't desirable.

        Trent.