[Tutor] Class-based generator

Peter Otten __peter__ at web.de
Mon Feb 18 10:00:47 CET 2013


Michael O'Leary wrote:

> I wrote some code to create tasks to be run in a queue based system last
> week. It consisted of a big monolithic function that consisted of two
> parts: 1) read data from a file and create dictionaries and lists to
> iterate through
> 2) iterate through the lists creating a job data file and a task for the
> queue one at a time until all of the data is dealt with
> 
> My boss reviewed my code and said that it would be more reusable and
> Pythonic if I refactored it as a generator that created job data files and
> iterated by calling the generator and putting a task on the queue for each
> job data file that was obtained.
> 
> This made sense to me, and since the code does a bunch of conversion of
> the data in the input file(s) to make it easier and faster to iterate
> through the data, I decided to create a class for the generator and put
> that conversion code into its __init__ function. So the class looked like
> this:
> 
> class JobFileGenerator:
>     def __init__(self, filedata, output_file_prefix, job_size):
>         <convert filedata to a more usable form>
> 
>     def next(self):
>         while <there is more data>:
>             <yield a job data file>
> 
> The problem is that the generator object is not created until you call
> next(), so the calling code has to look like this:
> 
> gen = JobFileGenerator(data, "output_", 20).next()
> for datafile in gen.next():
>     <put a job that uses datafile into the queue>
> 
> This code works OK, but I don't like that it needs to call next() once to
> get a generator and then call next() again repeatedly to get the data for
> the jobs. If I were to write this without a class as a single generator
> function, it would not have to do this, but it would have the monolithic
> structure that my boss objected to.
> 
> Would it work to do this:
> 
> for datafile in JobFileGenerator(data, "output_", 20).next():
>     <put a job that uses datafile into the queue>
> 
> or would that cause the JobFileGenerator's __init__ function to be called
> more than once? Are there examples I could look at of generator functions
> defined on classes similar to this, or is it considered a bad idea to mix
> the two paradigms?
> Thanks,
> Mike


You are abusing the next method; it is called once to build a generator. The 
convention for that is to use either a descriptive name (jobs() or somesuch) 
or __iter__():

class JobFile:
    def __init__(self, filedata, output_file_prefix, job_size):
        <convert filedata to a more usable form>
    def __iter__(self):
        while <there is more data>:
            <yield a job data file>


for job in JobFile(data, "output_", 20):
    <put a job that uses datafile into the queue>

Here the generator is created by the implicit call to JobFile.__iter__() at 
the start of the for loop. Subsequent iterations call next() on the 
generator returned by that call.

If you want the class itself to generate items you need a different 
approach:

class JobFileIter:
    def __init__(self, filedata, output_file_prefix, job_size):
        <convert filedata to a more usable form>
        self._done = False
    def __iter__(self):
        return self
    def next(self):
        if self._done or <no more data>:
            self._done = True
            raise StopIteration
        return <a job data file>


for job in JobFileIter(data, "output_", 20):
    <put a job that uses datafile into the queue>

Here __iter__() returns the JobFileIter instance, so for every iteration of 
the for loop JobFileIter.next() will be called -- until a StopIteration is 
raised.

That said, it is often sufficient to refactor complex code into a few 
dedicated functions -- Python is not Java, after all.

PS I'm assuming Python 2 -- for Python 3 the next() method must be replaced 
by __next__().



More information about the Tutor mailing list