generated comprehensions

Tue May 14 11:02:09 EDT 2002

On 13 May 2002, Garth T Kidd wrote:

> I'm a little worried about getting into the habit of using list
> comprehensions because I'll have to re-write the comprehensions back
> in "normal" Python whenever someone tries to shove a lot of data
> through them.

Who is 'someone' and why would they shove 'a lot' (too much) data through
your list comprehensions? (i.e. - are you actually writing code that has
to behave properly for unknown and possibly malicious users and/or handle
huge amounts of data or are you just musing over the theoretical limits of
what Python can handle?)

> If we're talking normal sequences, of course, it's not that much of a
> problem. If it fits in memory, it fits in memory. It's when you start
> using generators because you need to that suddenly comprehensions look
> a little brittle.
> 
>     def printOdds(upto): 
>         for odd in [num for num in xrange(upto) if num%2]:
>             print odd
> 
> ... works fine if upto is 5, but just sits there chewing up memory if
> upto is 10**9,

What are the odds of actually being bitten by this "problem"? Assuming you
have a real program, it'll be much more complex than this single function,
and I'd expect that it's likely to break down in other ways, not just list
comprehensions. For example, maybe you should avoid using strings as much
as possible too (and just do all your data processing by reading and
writing to an open file or a memory-mapped file) <0.7 wink>. 

IOW, if your program will realistically need to handle a billion of 
something, there's no point in singling out list comprehensions because 
you'll need to be careful every step of the way.

> I'm sure I'll figure out a decent rule of thumb (say, "unit test with
> the biggest practical number, and get rid of comprehensions if they
> turn out to be a problem", or "don't use comprehensions with
> generators")

How about "don't cross the bridge til you get to it"? If you are writing a 
program that needs to handle large amounts of data, that's something 
you'll have to keep in mind every step of the way. Outside of that very 
narrow domain, however, it's usually a waste of time to worry about it. 
For example, the other day I wrote a quick utilitiy to add line numbers to 
source code files for annotation - theoretically I can get into big 
trouble if the source code file is too big to fit into memory or if it is 
more than 2.1 billion lines long (I'm using Python 2.1), but realistically 
it's a waste of time to be concerned with that.

-Dave