N-grams

srinivas devaki mr.eightnoteight at gmail.com
Thu Nov 10 02:13:47 EST 2016


> def ngrams(iterable, n=2):
>     if n < 1:
>         raise ValueError
>     t = tee(iterable, n)
>     for i, x in enumerate(t):
>         for j in range(i):
>             next(x, None)
>     return zip(*t)

def myngrams(iterable, n=2):
    t = list(tee(iterable, 1))
    for _ in range(n - 1):
        t.extend(tee(t.pop()))
        next(t[-1], None)
    return zip(*t)


complexity wise it's O(N), but space complexity is O(N**2) to execute
this function,

to consume the output i.e completely iterate over zip object, the time
complexity is obviously O(N*M), and but space complexity is just
O(N*N).


---------------------------------------------------------------

In [1]: %paste
def ngrams(iterable, n=2):
    if n < 1:
        raise ValueError
    t = tee(iterable, n)
    for i, x in enumerate(t):
        for j in range(i):
            next(x, None)
    return zip(*t)
## -- End pasted text --

In [2]: %paste
def myngrams(iterable, n=2):
    t = list(tee(iterable, 1))
    for _ in range(n - 1):
        t.extend(tee(t.pop()))
        next(t[-1], None)
    return zip(*t)

## -- End pasted text --

In [3]: from itertools import *

In [4]: %timeit ngrams(range(1000), n=500)
100 loops, best of 3: 17.3 ms per loop

In [5]: %timeit myngrams(range(1000), n=500)
1000 loops, best of 3: 469 µs per loop

In [6]: %timeit list(ngrams(range(1000), n=500))
10 loops, best of 3: 21.2 ms per loop

In [7]: %timeit list(myngrams(range(1000), n=500))
100 loops, best of 3: 4.41 ms per loop

In [8]: %timeit ngrams(range(1000), n=100)
1000 loops, best of 3: 722 µs per loop

In [9]: %timeit myngrams(range(1000), n=100)
10000 loops, best of 3: 94.9 µs per loop

In [10]: %timeit list(ngrams(range(1000), n=100))
100 loops, best of 3: 2.09 ms per loop

In [11]: %timeit list(myngrams(range(1000), n=100))
1000 loops, best of 3: 1.46 ms per loop

In [12]:

-----------------------------------------------------------



Regards
Srinivas Devaki
Senior (4th year) student at Indian Institute of Technology (ISM), Dhanbad
Computer Science and Engineering Department
phone: +91 9491 383 249
telegram: @eightnoteight



More information about the Python-list mailing list