Refactoring a generator function
Steven Bethard
steven.bethard at gmail.com
Sat Dec 4 11:40:32 EST 2004
Kent Johnson wrote:
> Here is a simple function that scans through an input file and groups
> the lines of the file into sections. Sections start with 'Name:' and end
> with a blank line. The function yields sections as they are found.
>
> def makeSections(f):
> currSection = []
>
> for line in f:
> line = line.strip()
> if line == 'Name:':
> # Start of a new section
> if currSection:
> yield currSection
> currSection = []
> currSection.append(line)
>
> elif not line:
> # Blank line ends a section
> if currSection:
> yield currSection
> currSection = []
>
> else:
> # Accumulate into a section
> currSection.append(line)
>
> # Yield the last section
> if currSection:
> yield currSection
>
> There is some obvious code duplication in the function - this bit is
> repeated 2.67 times ;-):
> if currSection:
> yield currSection
> currSection = []
You can write:
for section in yieldSection():
yield section
in both places, but I assume you still don't like the code duplication
this would create.
How about something like (completely untested):
if line == 'Name:' or not line:
if currSection:
yield currSection
currSection = []
if line == 'Name:'
currSection.append(line)
Another consideration: in Python 2.4, itertools has a groupby function
that you could probably get some benefit from:
>>> class Sections(object):
... def __init__(self):
... self.is_section = False
... def __call__(self, line):
... if line == 'Name:\n':
... self.is_section = True
... elif line == '\n':
... self.is_section = False
... return self.is_section
...
>>> def make_sections(f):
... for _, section in itertools.groupby(f, Sections()):
... result = ''.join(section)
... if result != '\n':
... yield result
...
>>> f = 'Name:\nA\nx\ny\nz\n\nName:\nB\na\nb\nc\n'.splitlines(True)
>>> list(make_sections(f))
['Name:\nA\nx\ny\nz\n', 'Name:\nB\na\nb\nc\n']
More information about the Python-list
mailing list