[Tutor] Chunking list/array data?

Cameron Simpson cs at cskk.id.au
Thu Aug 22 06:00:14 EDT 2019


On 21Aug2019 21:26, Sarah Hembree <sarah123ed at gmail.com> wrote:
>How do you chunk data? We came up with the below snippet. It works (with
>integer list data) for our needs, but it seems so clunky.
>
>    def _chunks(lst: list, size: int) -> list:
>        return  [lst[x:x+size] for x in range(0, len(lst), size)]
>
>What do you do? Also, what about doing this lazily so as to keep memory
>drag at a minimum?

This looks pretty good to me. But as you say, it constructs the complete 
list of chunks and returns them all. For many chunks that is both slow 
and memory hungry.

If you want to conserve memory and return chunks in a lazy manner you 
can rewrite this as a generator. A first cut might look like this:

   def _chunks(lst: list, size: int) -> list:
       for x in range(0, len(lst), size):
           yield lst[x:x+size]

which causes _chunk() be a generator function: it returns an iterator 
which yields each chunk one at a time - the body of the function is kept 
"running", but stalled. When you iterate over the return from _chunk() 
Python runs that stalled function until it yields a value, then stalls 
it again and hands you that value.

Modern Python has a thing called a "generator expression". Your original 
function is a "list comprehension": it constructs a list of values and 
returns that list. In many cases, particularly for very long lists, that 
can be both slow and memory hungry. You can rewrite such a thing like 
this:

    def _chunks(lst: list, size: int) -> list:
        return ( lst[x:x+size] for x in range(0, len(lst), size) )

Omitting the square brackets turns this into a generator expression. It 
returns an iterator instead of a list, which functions like the 
generator function I sketched, and generates the chunks lazily.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list