[Python-Dev] API design: where to add async variants of existing stdlib APIs?

Tue Mar 7 19:17:03 EST 2017

On Tue, Mar 7, 2017 at 9:41 AM, Brett Cannon <brett at python.org> wrote:
> I don't think a common practice has bubbled up yet for when there's both
> synchronous and asynchronous versions of an API (closest I have seen is
> appending an "a" to the async version but that just looks like a spelling
> mistake to me most of the time). This is why the question of whether
> separate modules are a better idea is coming up.

For the CSV case, it might be sensible to factor out the io. Like,
provide an API that looks like:

pushdictreader = csv.PushDictReader()
while pushdictreader:
    chunk = read_some(...)
    pushdictreader.push(chunk)
    for row in pushdictreader:
        ...

This API can now straightforwardly be used with sync and async code.
Of course you'd want to wrap it up in a nicer interface, somewhere in
the ballpark of:

def sync_rows(read_some):
    pushdictreader = csv.PushDictReader()
    while pushdictreader:
        chunk = read_some(...)
        pushdictreader.push(chunk)
        for row in pushdictreader:
            yield row

async def async_rows(read_some):
    pushdictreader = csv.PushDictReader()
    while pushdictreader:
        chunk = await read_some(...)
        pushdictreader.push(chunk)
        for row in pushdictreader:
            yield row

So there'd still be a bit of code duplication, but much much less.

Essentially the idea here is to convert the csv module to sans-io
style (http://sans-io.readthedocs.io/).

Another option is to make it all-async internally, and then offer a
sync facade around it. So like start with the natural all-async
interface:

class AsyncFileLike(ABC):
    async def async_read(...):
        ...

class AsyncDictReader:
    def __init__(self, async_file_like):
        self._async_file_like = async_file_like

    async def __anext__(self):
        ...

And (crucially!) let's assume that the only way AsyncDictReader
interacts with the coroutine runner is by calls to
self._async_file_like.async_read. Now we can pass in a
secretly-actually-synchronous AsyncFileLike and make a synchronous
facade around the whole thing:

class AsyncSyncAdapter(AsyncFileLike):
    def __init__(self, sync_file_like):
        self._sync_file_like = sync_file_like

    # Technically an async function, but guaranteed to never yield
    async def read(self, *args, **kwargs):
        return self._sync_file_like.read(*args, **kwargs)

# Minimal coroutine supervisor: runs async_fn(*args, **kwargs), which
must never yield
def syncify(async_fn, *args, **kwargs):
    coro = async_fn(*args, **kwargs)
    it = coro.__await__()
    return next(it)

class DictReader:
    def __init__(self, sync_file_like):
        # Technically an AsyncDictReader, but guaranteed to never yield
        self._async_dict_reader =
AsyncDictReader(AsyncSyncAdapter(sync_file_like))

    def __next__(self):
        return syncify(self._async_dict_reader.__anext__)

So here we still have some goo around the edges of the module, but the
actual CSV logic only has to be written once, and can still be written
in a "pull" style where it does its own I/O, just like it is now.

This is basically another approach to writing sans-io protocols, with
the annoying trade-off that it means even your synchronous version
requires Python 3.5+. But for a stdlib module that's no big deal...

-n

> On Tue, 7 Mar 2017 at 02:24 Michel Desmoulin <desmoulinmichel at gmail.com>
> wrote:
>>
>> Last week I had to download a CSV from an FTP and push any update on it
>> using websocket so asyncio was a natural fit and the network part went
>> well.
>>
>> The surprise was that the CSV part would not work as expected. Usually I
>> read csv doing:
>>
>> import csv
>>
>> file_like_object = csv_crawler.get_file()
>> for row in csv.DictReader(file_like_object)
>>
>> But it didn't work because file_like_object.read() was a coroutine which
>> the csv module doesn't handle.
>>
>> So I had to do:
>>
>> import csv
>> import io
>>
>>         raw_bytes = await stream.read(10000000)
>>         wrapped_bytes = io.BytesIO(raw_bytes)
>>         text = io.TextIOWrapper(wrapped_bytes, encoding=encoding,
>> errors='replace')
>>
>>         for i, row in enumerate(csv.DictReader(text)):
>>
>> Turns out I used asyncio a bit, and I now the stdlib, the io AIP, etc.
>> But for somebody that doesn't, it's not very easy to figure out. Plus
>> it's not as elegant as traditional Python. Not to mention it loads the
>> entire CSV in memory.
>>
>> So I wondered if I could fix the csv module so it accept async. But the
>> question arised. Where should I put it ?
>>
>> - Create AsyncDictReader and AsyncReader ?
>> - Add inspect.iscoroutine calls widh it in the regular Readers and some
>> __aiter__ and __aenter__ ?
>> - add a csv.async namespace ?
>>
>> What API design are we recommanding for expose both sync and async
>> behaviors ?
>>
>>
>> Le 07/03/2017 à 03:08, Guido van Rossum a écrit :
>> > On Mon, Mar 6, 2017 at 5:57 PM, Raymond Hettinger
>> > <raymond.hettinger at gmail.com <mailto:raymond.hettinger at gmail.com>>
>> > wrote:
>> >
>> >     Of course, it makes sense that anything not specific to asyncio
>> >     should go outside of asyncio.
>> >
>> >     What I'm more concerned about is what the other places actually
>> >     are.   Rather than putting async variants of everything sprinkled
>> >     all over the standard library, I suggest collecting them all
>> >     together, perhaps in a new asynctools module.
>> >
>> >
>> > That's a tough design choice. I think neither extreme is particularly
>> > attractive -- having everything in an asynctools package might also
>> > bundle together thing that are entirely unrelated. In the extreme it
>> > would be like proposing that all metaclasses should go in a new
>> > "metaclasstools" package. I think we did a reasonable job with ABCs:
>> > core support goes in abc.py, support for collections ABCs goes into the
>> > collections package (in a submodule), and other packages and modules
>> > sometimes define ABCs for their own users.
>> >
>> > Also, in some cases I expect we'll have to create a whole new module
>> > instead of updating some ancient piece of code with newfangled async
>> > variants to its outdated APIs.
>> >
>> > --
>> > --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
>> >
>> >
>> > _______________________________________________
>> > Python-Dev mailing list
>> > Python-Dev at python.org
>> > https://mail.python.org/mailman/listinfo/python-dev
>> > Unsubscribe:
>> > https://mail.python.org/mailman/options/python-dev/desmoulinmichel%40gmail.com
>> >
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>

-- 
Nathaniel J. Smith -- https://vorpus.org