itertools, functools, file enhancement ideas

Paul Rubin http
Sat Apr 7 17:43:20 EDT 2007


I just had to write some programs that crunched a lot of large files,
both text and binary.  As I use iterators more I find myself wishing
for some maybe-obvious enhancements:

1. File iterator for blocks of chars:

       f = open('foo')
       for block in f.iterchars(n=1024):  ...

iterates through 1024-character blocks from the file.  The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory.  Default n in the above should be 1 char.

2. wrapped file openers:
    There should be functions (either in itertools, builtins, the sys
    module, or whereever) that open a file, expose one of the above
    iterators, then close the file, i.e.
       def file_lines(filename):
         with f as open(filename):
           for line in f:
             yield line
    so you can say

       for line in file_lines(filename):  
           crunch(line)

The current bogus idiom is to say "for line in open(filename)" but 
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement).  There should
similarly be "file_chars" which uses the n-chars iterator instead of
the line iterator.

3. itertools.ichain:
   yields the contents of each of a sequence of iterators, i.e.:
     def ichain(seq):
         for s in seq:
             for t in s:
                yield t
   this is different from itertools.chain because it lazy-evaluates its
   input sequence.  Example application:

      all_filenames = ['file1', 'file2', 'file3']
      # loop through all the files crunching all lines in each one
      for line in (ichain(file_lines(x) for x in all_filenames)):
         crunch(x)

4. functools enhancements (Haskell-inspired):
   Let f be a function with 2 inputs.  Then:
      a) def flip(f): return lambda x,y: f(y,x)
      b) def lsect(x,f): return partial(f,x)
      c) def rsect(f,x): return partial(flip(f), x)

   lsect and rsect allow making what Haskell calls "sections".  Example:
      # sequence of all squares less than 100
      from operator import lt
      s100 = takewhile(rsect(lt, 100), (x*x for x in count()))

      



More information about the Python-list mailing list