N-grams

Peter Otten __peter__ at web.de
Thu Nov 10 03:59:03 EST 2016


Paul Rubin wrote:

> This can probably be cleaned up some:
> 
>     from itertools import islice
>     from collections import deque
> 
>     def ngram(n, seq):
>         it = iter(seq)
>         d = deque(islice(it, n))
>         if len(d) != n:
>             return
>         for s in it:
>             yield tuple(d)
>             d.popleft()
>             d.append(s)
>         if len(d) == n:
>             yield tuple(d)
> 
>     def test():
>         xs = range(20)
>         for a in ngram(5, xs):
>             print a
> 
>     test()

I started with

def ngrams2(items, n):
    items = iter(items)
    d = deque(islice(items, n-1), maxlen=n)
    for item in items:
        d.append(item)
        yield tuple(d)

and then tried a few dirty tricks, but nothing except omitting tuple(d) 
brought performance near Steven's version.

Just for fun, here's the obligatory oneliner:

def ngrams1(items, n):
    return zip(*(islice(it, i, None) for i, it in enumerate(tee(items, n))))

Be aware that the islice() overhead is significant (I wonder if the islice() 
implementation could be tweaked to reduce that).




More information about the Python-list mailing list