Refactoring a generator function

Sat Dec 4 11:40:32 EST 2004

Kent Johnson wrote:
> Here is a simple function that scans through an input file and groups 
> the lines of the file into sections. Sections start with 'Name:' and end 
> with a blank line. The function yields sections as they are found.
> 
> def makeSections(f):
>     currSection = []
> 
>     for line in f:
>         line = line.strip()
>         if line == 'Name:':
>             # Start of a new section
>             if currSection:
>                 yield currSection
>                 currSection = []
>             currSection.append(line)
> 
>         elif not line:
>             # Blank line ends a section
>             if currSection:
>                 yield currSection
>                 currSection = []
> 
>         else:
>             # Accumulate into a section
>             currSection.append(line)
> 
>     # Yield the last section
>     if currSection:
>         yield currSection
> 
> There is some obvious code duplication in the function - this bit is 
> repeated 2.67 times ;-):
>             if currSection:
>                 yield currSection
>                 currSection = []

You can write:

for section in yieldSection():
     yield section

in both places, but I assume you still don't like the code duplication 
this would create.

How about something like (completely untested):

if line == 'Name:' or not line:
     if currSection:
         yield currSection
         currSection = []
     if line == 'Name:'
         currSection.append(line)

Another consideration: in Python 2.4, itertools has a groupby function 
that you could probably get some benefit from:

 >>> class Sections(object):
...     def __init__(self):
...         self.is_section = False
...     def __call__(self, line):
...         if line == 'Name:\n':
...             self.is_section = True
...         elif line == '\n':
...             self.is_section = False
...         return self.is_section
...
 >>> def make_sections(f):
...     for _, section in itertools.groupby(f, Sections()):
...         result = ''.join(section)
...         if result != '\n':
...             yield result
...
 >>> f = 'Name:\nA\nx\ny\nz\n\nName:\nB\na\nb\nc\n'.splitlines(True)
 >>> list(make_sections(f))
['Name:\nA\nx\ny\nz\n', 'Name:\nB\na\nb\nc\n']