Iterate over text file, discarding some lines via context manager

Ned Batchelder ned at nedbatchelder.com
Fri Nov 28 11:01:03 EST 2014


On 11/28/14 10:22 AM, Dave Angel wrote:
> On 11/28/2014 10:04 AM, fetchinson . wrote:
>> Hi all,
>>
>> I have a feeling that I should solve this by a context manager but
>> since I've never used them I'm not sure what the optimal (in the
>> python sense) solution is. So basically what I do all the time is
>> this:
>>
>> for line in open( 'myfile' ):
>>      if not line:
>>          # discard empty lines
>>          continue
>>      if line.startswith( '#' ):
>>          # discard lines starting with #
>>          continue
>>      items = line.split( )
>>      if not items:
>>          # discard lines with only spaces, tabs, etc
>>          continue
>>
>>      process( items )
>>
>> You see I'd like to ignore lines which are empty, start with a #, or
>> are only white space. How would I write a context manager so that the
>> above simply becomes
>>
>> with some_tricky_stuff( 'myfile' ) as items:
>>      process( items )
>>
>
> I see what you're getting at, but a context manager is the wrong
> paradigm.  What you want is a generator.   (untested)
>
> def mygenerator(filename):
>      with open(filename) as f:
>          for line in f:
>              if not line: continue
>              if line.startswith('#'): continue
>              items = line.split()
>              if not items: continue
>              yield items
>
> Now your caller simply does:
>
> for items in mygenerator(filename):
>        process(items)
>
>

I think it's slightly better to leave the open outside the generator:

def interesting_lines(f):
     for line in f:
         line = line.strip()
         if line.startswith('#'):
             continue
         if not line:
             continue
         yield line

with open("my_config.ini") as f:
     for line in interesting_lines(f):
         do_something(line)

This makes interesting_lines a pure filter, and doesn't care what sort 
of sequence of strings it's operating on.  This makes it easier to test, 
and more flexible.  The caller's code is also clearer in my opinion.

BTW: this example is taken verbatim from my PyCon presentation on 
iteration, it you are interested: http://nedbatchelder.com/text/iter.html

-- 
Ned Batchelder, http://nedbatchelder.com




More information about the Python-list mailing list