Refactoring a generator function

Kent Johnson kent3737 at yahoo.com
Sat Dec 4 10:23:50 EST 2004


Here is a simple function that scans through an input file and groups the lines of the file into 
sections. Sections start with 'Name:' and end with a blank line. The function yields sections as 
they are found.

def makeSections(f):
     currSection = []

     for line in f:
         line = line.strip()
         if line == 'Name:':
             # Start of a new section
             if currSection:
                 yield currSection
                 currSection = []
             currSection.append(line)

         elif not line:
             # Blank line ends a section
             if currSection:
                 yield currSection
                 currSection = []

         else:
             # Accumulate into a section
             currSection.append(line)

     # Yield the last section
     if currSection:
         yield currSection

There is some obvious code duplication in the function - this bit is repeated 2.67 times ;-):
             if currSection:
                 yield currSection
                 currSection = []

As a firm believer in Once and Only Once, I would like to factor this out into a separate function, 
either a nested function of makeSections(), or as a separate method of a class implementation. 
Something like this:

def makeSections(f):	### DOESN'T WORK ###
     currSection = []

     def yieldSection():
         if currSection:
             yield currSection
             del currSection[:]

     for line in f:
         line = line.strip()
         if line == 'Name:':
             # Start of a new section
             yieldSection()
             currSection.append(line)

         elif not line:
             # Blank line ends a section
             yieldSection()

         else:
             # Accumulate into a section
             currSection.append(line)

     # Yield the last section
     yieldSection()


The problem is that yieldSection() now is the generator, and makeSections() is not, and the result 
of calling yieldSection() is a new iterator, not the section...

Is there a way to do this or do I have to live with the duplication?

Thanks,
Kent


Here is a complete program:

data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
....................
xxxxxxxxxxxxxxxxxxxx


Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx

'''

import cStringIO    # just for test

def makeSections(f):
     ''' This is a generator function. It will return successive sections
         of f until EOF.

         Sections are every line from a 'Name:' line to the first blank line.
         Sections are returned as a list of lines with line endings stripped.
     '''

     currSection = []

     for line in f:
         line = line.strip()
         if line == 'Name:':
             # Start of a new section
             if currSection:
                 yield currSection
                 currSection = []
             currSection.append(line)

         elif not line:
             # Blank line ends a section
             if currSection:
                 yield currSection
                 currSection = []

         else:
             # Accumulate into a section
             currSection.append(line)

     # Yield the last section
     if currSection:
         yield currSection


f = cStringIO.StringIO(data)

for section in makeSections(f):
     print 'Section'
     for line in section:
         print '   ', line
     print



More information about the Python-list mailing list