[Tutor] parsing a "chunked" text file

Hugo Arts hugo.yoshi at gmail.com
Thu Mar 18 13:26:44 CET 2010


On Thu, Mar 18, 2010 at 12:54 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Karim Liateni, 04.03.2010 01:23:
>
> Yes, a *big* difference in the true sense of the word. Your code (assuming
> you meant to write "... for line in ..." ) evaluates the entire list
> comprehension before returning from the call. Steven's code returns a
> generator that only handles one line (or a couple of empty lines) at a time.
> So, assuming that this runs against a large file, Steven's code uses only a
> constant amount of memory, compared to the whole file in your case, and is
> likely also a lot faster than your code as it involves less looping.
>

Though, if you changed the brackets into parentheses, you'd get a
generator expression, which *is* equivalent to Steven's version,
except that it calls strip() twice, which is a bit wasteful.

If the unnecessary extra call bothers you, you could do one of two things:
1) Learn how the yield keyword works. You should do this. It's an
awesome feature, and you'll come across it many more times.
2) go functional and import itertools. ifilter with a generator
expression, like so (pure functional programmers can also use imap
instead of the generator expr., which might be faster. profile to be
sure)

def skip_blanks(lines):
    return ifilter(None, (l.strip() for l in lines))

Very short, has all the memory and speed benefits of the generator.
Personally I really like terse functional programming like this,
though I believe the general consensus in the python community is that
imperative alternatives are usually clearer to read.

If you want to know more about the yield keyword:
A terse description (assumes that you know how iterators work) is
here: http://docs.python.org/tutorial/classes.html#generators
A more detailed description of iterators and generators can be found
here: http://www.ibm.com/developerworks/library/l-pycon.html

Hugo


More information about the Tutor mailing list