itertools.groupby

Mon Apr 22 11:04:52 EDT 2013

On 2013-04-22, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> On 22 April 2013 15:24, Neil Cerutti <neilc at norwich.edu> wrote:
>>
>> Hrmmm, hoomm. Nobody cares for slicing any more.
>>
>> def headered_groups(lst, header):
>>     b = lst.index(header) + 1
>>     while True:
>>         try:
>>             e = lst.index(header, b)
>>         except ValueError:
>>             yield lst[b:]
>>             break
>>         yield lst[b:e]
>>         b = e+1
>
> This requires the whole file to be read into memory. Iterators
> are typically preferred over list slicing for sequential text
> file access since you can avoid loading the whole file at once.
> This means that you can process a large file while only using a
> constant amount of memory.

I agree, but this application processes unknowns-sized slices,
you have to build lists anyhow. I find slicing much more
convenient than accumulating in this case, but it's possibly a
tradeoff.

> with open('data.txt') as inputfile:
>     for group in headered_groups(map(str.strip, inputfile)):
>         print(group)

Thanks, that's a nice improvement.

-- 
Neil Cerutti