[Python-ideas] Buffering iterators?

Michael Mitchell epsilonmichael at gmail.com
Sat Dec 19 07:01:40 EST 2015


Have you considered doing this at the plain Python level? Something such as
the following would have the desired semantics from my understanding.

def buffered_iterator(it, size):
  while True:
    buffer = [next(it) for _ in range(size)]
    for element in buffer:
      yield element

As for whether something like this can achieve optimizations by utilizing
things such as locality is entirely dependent on the implementation of the
runtime. It doesn't make much sense to me to maintain runtime level support
for such an edge use case as it would be yet another burden on every
implementation of Python.

On Tue, Dec 15, 2015 at 2:04 AM, Franklin? Lee <
leewangzhong+python at gmail.com> wrote:

> (This would be a lot of work that I wouldn't know how to do, but is it
> worth thinking about? Maybe it's already been done at the level
> necessary.
>
> Also, this is a proposal for the sake of theoretical optimization, in
> case you don't like that, and it will require a lot of work in a lot
> of code everywhere.
>
> As I see it, even if it's possible and desirable to do this, it would
> take years of work and testing to make it beneficial.)
>
> The move from Python 2 (disclaimer: which I barely touched, so I have
> little sentimental attachment) to Python 3 resulted in many functions
> returning iterators instead of lists. This saves a lot of unnecessary
> memory when iterating, say, over the indices of a large list,
> especially if we break in the middle.
>
> I'm wondering, though, about a step backwards: generating values
> before they're needed. The idea is based on file buffered reading and
> memory prefetching
> (
> https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory#DDR_SDRAM_prefetch_architecture
> ).
> In fact, I'm hoping to take advantage of such things.
>
> For example, in `sum(lst[i] * i for i in range(10000))`, `sum` will
> exhaust the iterator, so it can ask the generator to return buffers,
> and it will internally read the elements off the lists.
>
> It would be the responsibility of the iterator to decide whether to
> respect the request, and to determine the size of the buffer. It would
> be the responsibility of the consumer to request it, and consumers
> should only request it if they think they'll almost definitely consume
> a lot at a time.
>
> The idea is, especially for complex nested iterators, instead of
> running A B C A B C A B C..., where each is the code for generating a
> next thing from the previous, that the interpreter runs A A A A A...,
> B B B B B..., C C C..., which could mean a lot more memory locality in
> both instructions and objects.
>
> There's the possibility that a function has side-effects, so buffering
> would have different semantics than normal. There's also the
> possibility that getting the next element is complex enough that it
> wouldn't help to buffer. If the iterator can't tell, then it should
> just not buffer.
>
> Here's an obnoxious example of where you can't tell:
>
>     def f(i):
>         return i
>
>     s = 0
>     for i in (f(x) for x in range(100)):
>         s += f(i)
>         def f(x):
>             return x + s
>
> In fact, all Python function calls are probably unsafe, including with
> operators (which can be legally replaced during the iteration). Well,
> `map` and `filter` are possible exceptions in special cases, because
> the lookup for their function is bound at the call to `map`.
>
> It's usually safe if you're just using reversed, enumerate, builtin
> operators, dict views, etc. on an existing data structure, as long as
> your iteration doesn't modify entries, unlike so:
>
>     for i, x in enumerate(reversed(lst)):
>         lst[i+1] = x
>
> But I'm looking toward the future, where it might be possible for the
> interpreter to analyze loops and functions before making such
> decisions. Then again, if the interpreter is that smart, it can figure
> out where and when to buffer without adding to the API of iterators.
>
> Anyway, here's an idea, how it might be helpful, and how it might not
> be helpful.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20151219/370822fd/attachment-0001.html>


More information about the Python-ideas mailing list