Iterate over text file, discarding some lines via context manager

Akira Li 4kir4.1i at gmail.com
Fri Nov 28 16:33:55 EST 2014


Ned Batchelder <ned at nedbatchelder.com> writes:

> On 11/28/14 10:22 AM, Dave Angel wrote:
>> On 11/28/2014 10:04 AM, fetchinson . wrote:
>>> Hi all,
>>>
>>> I have a feeling that I should solve this by a context manager but
>>> since I've never used them I'm not sure what the optimal (in the
>>> python sense) solution is. So basically what I do all the time is
>>> this:
>>>
>>> for line in open( 'myfile' ):
>>>      if not line:
>>>          # discard empty lines
>>>          continue
>>>      if line.startswith( '#' ):
>>>          # discard lines starting with #
>>>          continue
>>>      items = line.split( )
>>>      if not items:
>>>          # discard lines with only spaces, tabs, etc
>>>          continue
>>>
>>>      process( items )
>>>
>>> You see I'd like to ignore lines which are empty, start with a #, or
>>> are only white space. How would I write a context manager so that the
>>> above simply becomes
>>>
>>> with some_tricky_stuff( 'myfile' ) as items:
>>>      process( items )
>>>
>>
>> I see what you're getting at, but a context manager is the wrong
>> paradigm.  What you want is a generator.   (untested)
>>
>> def mygenerator(filename):
>>      with open(filename) as f:
>>          for line in f:
>>              if not line: continue
>>              if line.startswith('#'): continue
>>              items = line.split()
>>              if not items: continue
>>              yield items
>>
>> Now your caller simply does:
>>
>> for items in mygenerator(filename):
>>        process(items)
>>
>>
>
> I think it's slightly better to leave the open outside the generator:
>
> def interesting_lines(f):
>     for line in f:
>         line = line.strip()
>         if line.startswith('#'):
>             continue
>         if not line:
>             continue
>         yield line
>
> with open("my_config.ini") as f:
>     for line in interesting_lines(f):
>         do_something(line)
>
> This makes interesting_lines a pure filter, and doesn't care what sort
> of sequence of strings it's operating on.  This makes it easier to
> test, and more flexible.  The caller's code is also clearer in my
> opinion.
>
> BTW: this example is taken verbatim from my PyCon presentation on
> iteration, it you are interested:
> http://nedbatchelder.com/text/iter.html

The conditions could be combined in this case:

  def iter_rows(lines):
      for line in lines:
          items = line.split()
          if items and not items[0].startswith('#'):
             yield items # space-separated non-emtpy non-comment items
  
  with open(filename):
      for items in iter_rows(file):
          process(items)


--
Akira




More information about the Python-list mailing list