[Tutor] Class-based generator
Peter Otten
__peter__ at web.de
Mon Feb 18 10:00:47 CET 2013
Michael O'Leary wrote:
> I wrote some code to create tasks to be run in a queue based system last
> week. It consisted of a big monolithic function that consisted of two
> parts: 1) read data from a file and create dictionaries and lists to
> iterate through
> 2) iterate through the lists creating a job data file and a task for the
> queue one at a time until all of the data is dealt with
>
> My boss reviewed my code and said that it would be more reusable and
> Pythonic if I refactored it as a generator that created job data files and
> iterated by calling the generator and putting a task on the queue for each
> job data file that was obtained.
>
> This made sense to me, and since the code does a bunch of conversion of
> the data in the input file(s) to make it easier and faster to iterate
> through the data, I decided to create a class for the generator and put
> that conversion code into its __init__ function. So the class looked like
> this:
>
> class JobFileGenerator:
> def __init__(self, filedata, output_file_prefix, job_size):
> <convert filedata to a more usable form>
>
> def next(self):
> while <there is more data>:
> <yield a job data file>
>
> The problem is that the generator object is not created until you call
> next(), so the calling code has to look like this:
>
> gen = JobFileGenerator(data, "output_", 20).next()
> for datafile in gen.next():
> <put a job that uses datafile into the queue>
>
> This code works OK, but I don't like that it needs to call next() once to
> get a generator and then call next() again repeatedly to get the data for
> the jobs. If I were to write this without a class as a single generator
> function, it would not have to do this, but it would have the monolithic
> structure that my boss objected to.
>
> Would it work to do this:
>
> for datafile in JobFileGenerator(data, "output_", 20).next():
> <put a job that uses datafile into the queue>
>
> or would that cause the JobFileGenerator's __init__ function to be called
> more than once? Are there examples I could look at of generator functions
> defined on classes similar to this, or is it considered a bad idea to mix
> the two paradigms?
> Thanks,
> Mike
You are abusing the next method; it is called once to build a generator. The
convention for that is to use either a descriptive name (jobs() or somesuch)
or __iter__():
class JobFile:
def __init__(self, filedata, output_file_prefix, job_size):
<convert filedata to a more usable form>
def __iter__(self):
while <there is more data>:
<yield a job data file>
for job in JobFile(data, "output_", 20):
<put a job that uses datafile into the queue>
Here the generator is created by the implicit call to JobFile.__iter__() at
the start of the for loop. Subsequent iterations call next() on the
generator returned by that call.
If you want the class itself to generate items you need a different
approach:
class JobFileIter:
def __init__(self, filedata, output_file_prefix, job_size):
<convert filedata to a more usable form>
self._done = False
def __iter__(self):
return self
def next(self):
if self._done or <no more data>:
self._done = True
raise StopIteration
return <a job data file>
for job in JobFileIter(data, "output_", 20):
<put a job that uses datafile into the queue>
Here __iter__() returns the JobFileIter instance, so for every iteration of
the for loop JobFileIter.next() will be called -- until a StopIteration is
raised.
That said, it is often sufficient to refactor complex code into a few
dedicated functions -- Python is not Java, after all.
PS I'm assuming Python 2 -- for Python 3 the next() method must be replaced
by __next__().
More information about the Tutor
mailing list