itertools, functools, file enhancement ideas
Alex Martelli
aleax at mac.com
Sat Apr 7 20:00:42 EDT 2007
Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
> I just had to write some programs that crunched a lot of large files,
> both text and binary. As I use iterators more I find myself wishing
> for some maybe-obvious enhancements:
>
> 1. File iterator for blocks of chars:
>
> f = open('foo')
> for block in f.iterchars(n=1024): ...
>
> iterates through 1024-character blocks from the file. The default iterator
> which loops through lines is not always a good choice since each line can
> use an unbounded amount of memory. Default n in the above should be 1 char.
the simple way (letting the file object deal w/buffering issues):
def iterchars(f, n=1):
while True:
x = f.read(n)
if not x: break
yield x
the fancy way (doing your own buffering) is left as an exercise for the
reader. I do agree it would be nice to have in some module.
> 2. wrapped file openers:
> There should be functions (either in itertools, builtins, the sys
> module, or whereever) that open a file, expose one of the above
> iterators, then close the file, i.e.
> def file_lines(filename):
> with f as open(filename):
> for line in f:
> yield line
> so you can say
>
> for line in file_lines(filename):
> crunch(line)
>
> The current bogus idiom is to say "for line in open(filename)" but
> that does not promise to close the file once the file is exhausted
> (part of the motivation of the new "with" statement). There should
> similarly be "file_chars" which uses the n-chars iterator instead of
> the line iterator.
I'm +/-0 on this one vs the idioms:
with open(filename) as f:
for line in f: crunch(line)
with open(filename, 'rb') as f:
for block in iterchars(f): crunch(block)
Making two lines into one is a weak use case for a stdlib function.
> 3. itertools.ichain:
> yields the contents of each of a sequence of iterators, i.e.:
> def ichain(seq):
> for s in seq:
> for t in s:
> yield t
> this is different from itertools.chain because it lazy-evaluates its
> input sequence. Example application:
>
> all_filenames = ['file1', 'file2', 'file3']
> # loop through all the files crunching all lines in each one
> for line in (ichain(file_lines(x) for x in all_filenames)):
> crunch(x)
Yes, subtle but important distinction.
> 4. functools enhancements (Haskell-inspired):
> Let f be a function with 2 inputs. Then:
> a) def flip(f): return lambda x,y: f(y,x)
> b) def lsect(x,f): return partial(f,x)
> c) def rsect(f,x): return partial(flip(f), x)
>
> lsect and rsect allow making what Haskell calls "sections". Example:
> # sequence of all squares less than 100
> from operator import lt
> s100 = takewhile(rsect(lt, 100), (x*x for x in count()))
Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
Alex
More information about the Python-list
mailing list