[Tutor] Re: List comprehension and generators.

Andrei project5 at redrival.net
Fri Apr 9 08:46:22 EDT 2004


Shawhan, Doug (EM, ITS) wrote on Thu, 8 Apr 2004 17:39:47 -0400:

Hi,
<snip>

My examples are for the following string:

>>> s = "abcdefghijklmnopqrstuvwx"
>>> len(s)
24
>>> from random import randint as rnd # save me some typing

> 1. How would one split that entire string into a list of same-size elements:
> 
> I.e. ['hondoh', 'arriets', 'happyhi', 'ppiehuth', 'yundai']

>>> size = 4 # size of the pieces we want
>>> [ s[i*size:(i+1)*size] for i in range(len(s)//size) ]
['abcd', 'efgh', 'ijkl', 'mnop', 'qrst', 'uvwx']

> 2. How would one split that string into a list of *non-repeating* random-sized elements?
> 
> I.e. ['ho', 'ndoharri', 'etshappyhippieh', 'uthyundai']

Can't think of a way to do this with a list comprehension directly. What
can be done is manipulate an existing list object inside the list
comprehension.

>>> mylist = [] # list object which will store our results
>>> [ mylist.append(ss) for ss in [ s[rnd(0,len(s)):rnd(0,len(s))] for i in range(rnd(8,20)) ] if ss and ss not in mylist ]
[None, None, None, None, None]
>>> mylist
['ghijklmnopqr', 'mnopq', 'jklmnopqrstu', 'opqrstu', 'cde']

The inner list comprehension:

  [ s[rnd(0,len(s)):rnd(0,len(s))] for i in range(rnd(8,20)) ]

just creates random number pairs and pulls the corresponding pieces out of
the string. You can run this on its own and get something like this:

>>> [ s[rnd(0,len(s)):rnd(0,len(s))] for i in range(rnd(8,20)) ]
['bcdefghijklmnopqrs', '', '', 'bcdefghijklmnopqr', '', 'hijklm',
'ghijklmnopqrs', '', '', '']

The outer list comprehension loops over this list, calling each of its
elements ss and checks if ss is non-empty and if ss isn't already in
mylist. If both these conditions are True, ss is added to mylist, otherwise
the code just skips to the next element inside the inner list.

Now this has a list being manipulated in a list comprehension containing a
second list comprehension and two conditions. The result is not pretty and
I wouldn't recommend using this kind of code. Rolling out this code into an
explicit for-loop would be much, much clearer.

> (I know I could do it with a 'for' loop, but I'll bet you can do it with list comprehension)

Can != should :)

> Which in turn leads me to:
> 
> 3. How would one use comprehension or a generator to create a 
> random-sized list of random numbers? (Or is this not the proper 
> use of a generator?)

You already had the code required for this in your own examples:

[(s[:random.randint(1,10)]) for each in range(random.randint(1,10))]

The second part takes care of this list being of random length, the first
part can be adapted to insert a random number instead of a piece of the
string:

>>> [random.randint(1,10) for each in range(random.randint(1,10))]
[2, 1, 4, 5, 6, 3]

You could to some extent see generators as lists which are not in-memory
and can only be accessed sequentially. 

>>> def randnumgen(n=None):
...     if n==None:
...         n = random.randint(1,10)
...     for i in range(n):
...         yield random.randint(0,10)
...     
>>> list(randnumgen()) 
[7, 5, 9, 0, 9, 2, 3, 3, 7, 9]
>>> list(randnumgen())
[1, 4, 0, 1]

This is useful if you intend to manipulate a really huge list: you wouldn't
run out of memory just creating that list. E.g.:

>>> for elem in range(10000000):
...     pass

By the end of this run, Python is using about 160 MB. Now you could write a
generator for it, e.g.:

>>> def mygen(n): # a bit like range(n)
...     i = 0
...     while i<n:
...         yield i
...         i += 1
>>> for elem in mygen(100000000):
...     pass

This one has no influence upon Python's memory usage. 

By the way, in this case the generator is about 2x slower than the built-in
range function on my computer, presumably because range() is looping in C,
while my generator loops in Python. 
But if I increase the number to 20 mln instead of 10 mln, the generator
outperforms the range() by a factor 2 because the overhead of allocating
memory to that list (probably includes some swapping) is larger than the
overhead of the loop. If that number is increased even more, range() will
become completely unusable due to its memory requirements (my Python
interpreter consumes well over 200 MB at this point and peaked at nearly
300 MB!), while the generator will continue to work regardless of how many
elements you make it generate.

-- 
Yours,

Andrei

=====
Real contact info (decode with rot13):
cebwrpg5 at jnanqbb.ay. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.




More information about the Tutor mailing list