Classical FP problem in python : Hamming problem
Francis Girard
francis.girard at free.fr
Mon Jan 24 08:09:25 EST 2005
Ok, I think that the bottom line is this :
For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of Eratosthene
-- there's a subtle difference), i.e. all those algorithms that typically
rely upon recursion to get the beginning of the generated elements in order
to generate the next one ...
... so for this family of algorithms, we have, somehow, to abandon recursion,
and to explicitely maintain one or many lists of what had been generated.
One solution for this is suggested in "test_generators.py". But I think that
this solution is evil as it looks like recursion is used but it is not and it
depends heavily on how the m235 function (for example) is invoked.
Furthermore, it is _NOT_ memory efficient as pretended : it leaks ! It
internally maintains a lists of generated results that grows forever while it
is useless to maintain results that had been "consumed". Just for a gross
estimate, on my system, percentage of memory usage for this program grows
rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between 31%-36%.
Ugly and inefficient.
The solution of Jeff Epler is far more elegant. The "itertools.tee" function
does just what we want. It internally maintain a list of what had been
generated and deletes the "consumed" elements as the algo unrolls. To follow
with my gross estimate, memory usage grows from 1.2% to 1.9% after 5 minutes
(probably only affected by the growing size of Huge Integer). CPU usage
varies between 27%-29%.
Beautiful and effecient.
You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.
For this family of algorithms, itertools.tee is the way to go.
I think that the semi-tutorial in "test_generators.py" should be updated to
use "tee". Or, at least, a severe warning comment should be written.
Thank you,
Francis Girard
FRANCE
Le dimanche 23 Janvier 2005 23:27, Jeff Epler a écrit :
> Your formulation in Python is recursive (hamming calls hamming()) and I
> think that's why your program gives up fairly early.
>
> Instead, use itertools.tee() [new in Python 2.4, or search the internet
> for an implementation that works in 2.3] to give a copy of the output
> sequence to each "multiply by N" function as well as one to be the
> return value.
>
> Here's my implementation, which matched your list early on but
> effortlessly reached larger values. One large value it printed was
> 6412351813189632 (a Python long) which indeed has only the distinct
> prime factors 2 and 3. (2**43 * 3**6)
>
> Jeff
>
> from itertools import tee
> import sys
>
> def imerge(xs, ys):
> x = xs.next()
> y = ys.next()
> while True:
> if x == y:
> yield x
> x = xs.next()
> y = ys.next()
> elif x < y:
> yield x
> x = xs.next()
> else:
> yield y
> y = ys.next()
>
> def hamming():
> def _hamming(j, k):
> yield 1
> hamming = generators[j]
> for i in hamming:
> yield i * k
> generators = []
> generator = imerge(imerge(_hamming(0, 2), _hamming(1, 3)), _hamming(2,
> 5)) generators[:] = tee(generator, 4)
> return generators[3]
>
> for i in hamming():
> print i,
> sys.stdout.flush()
More information about the Python-list
mailing list