A gnarly little python loop

Steve Howell showell30 at yahoo.com
Sun Nov 11 14:16:06 EST 2012


On Nov 11, 10:34 am, Peter Otten <__pete... at web.de> wrote:
> Steve Howell wrote:
> > On Nov 11, 1:09 am, Paul Rubin <no.em... at nospam.invalid> wrote:
> >> Cameron Simpson <c... at zip.com.au> writes:
> >> > | I'd prefer the original code ten times over this inaccessible beast.
> >> > Me too.
>
> >> Me, I like the itertools version better.  There's one chunk of data
> >> that goes through a succession of transforms each of which
> >> is very straightforward.
>
> > Thanks, Paul.
>
> > Even though I supplied the "inaccessible" itertools version, I can
> > understand why folks find it inaccessible.  As I said to the OP, there
> > was nothing wrong with the original imperative approach; I was simply
> > providing an alternative.
>
> > It took me a while to appreciate itertools, but the metaphor that
> > resonates with me is a Unix pipeline.  It's just a metaphor, so folks
> > shouldn't be too literal, but the idea here is this:
>
> >   page_nums -> pages -> valid_pages -> tweets
>
> > The transforms are this:
>
> >   page_nums -> pages: call API via imap
> >   pages -> valid_pages: take while true
> >   valid_pages -> tweets: use chain.from_iterable to flatten results
>
> > Here's the code again for context:
>
> >     def get_tweets(term):
> >         def get_page(page):
> >             return getSearch(term, page)
> >         page_nums = itertools.count(1)
> >         pages = itertools.imap(get_page, page_nums)
> >         valid_pages = itertools.takewhile(bool, pages)
> >         tweets = itertools.chain.from_iterable(valid_pages)
> >         return tweets
>
> Actually you supplied the "accessible" itertools version. For reference,
> here's the inaccessible version:
>
> class api:
>     """Twitter search API mock-up"""
>     pages = [
>         ["a", "b", "c"],
>         ["d", "e"],
>         ]
>     @staticmethod
>     def GetSearch(term, page):
>         assert term == "foo"
>         assert page >= 1
>         if page > len(api.pages):
>             return []
>         return api.pages[page-1]
>
> from collections import deque
> from functools import partial
> from itertools import chain, count, imap, takewhile
>
> def process(tweet):
>     print tweet
>
> term = "foo"
>
> deque(
>     imap(
>         process,
>         chain.from_iterable(
>             takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
>     maxlen=0)
>
> ;)

I know Peter's version is tongue in cheek, but I do think that it has
a certain expressive power, and it highlights three mind-expanding
Python modules.

Here's a re-flattened take on Peter's version ("Flat is better than
nested." -- PEP 20):

    term = "foo"
    search = partial(api.GetSearch, term)
    nums = count(1)
    paged_tweets = imap(search, nums)
    paged_tweets = takewhile(bool, paged_tweets)
    tweets = chain.from_iterable(paged_tweets)
    processed_tweets = imap(process, tweets)
    deque(processed_tweets, maxlen=0)

The use of deque to exhaust an iterator is slightly overboard IMHO,
but all the other lines of code can be fairly easily understood once
you read the docs.

    partial: http://docs.python.org/2/library/functools.html
    count, imap, takewhile, chain.from_iterable:
http://docs.python.org/2/library/itertools.html
    deque: http://docs.python.org/2/library/collections.html




More information about the Python-list mailing list