Lisp to Python translation criticism?

Andrew Henshaw andrew.henshaw at mail.com
Fri Aug 16 23:16:00 EDT 2002


John E. Barham wrote:

...snip Lisp code ...
> 
> Python:
> 
> def spam_word_prob(word, good, bad, ngood, nbad):
>     g = 2 * good.get(word, 0)
>     b = bad.get(word, 0)
>     if g + b >= 5:
>         return max(0.01, min(0.99, float(min(1, b / nbad) / ((min(1, g /
> ngood) + min(1, b / nbad))))))
>     else:
>         return 0.0
> 
> def spam_prob(probs):
>     prod = 1.0
>     for prob in probs:
>         prod = prod * prob
>     inv_probs = [1 - x for x in probs]
>     inv_prob = 1.0
>     for prob in inv_probs:
>         inv_prob = inv_prob * prob
>     return prod / (prob + inv_prob)
> 
> Any comments on the correctness, style, efficiency etc. of my translation?
> I'd like to write a Python spam filtering system using Graham's
> techniques.
> 
> Please note that this is not meant to revive the perpetual debate over the
> relative merits of Python's lambda...  ;)
> 
>     John

Should that last line be 

        return prod / (prod + inv_prob)

?

Probably not a good idea to have such similar variable names.


On my machine, the fragment

    inv_prob = 1.0
    for prob in inv_probs:
        inv_prob = inv_prob * prob

takes about 50% more time to execute, than

    inv_prob = reduce(operator.mul, inv_probs)

for inv_probs of length 10.  The advantage to this code increases as the 
length of the list increases.  That's one local optimization that could be 
made. 

I'd say that you would increase both clarity and speed by collapsing the 
three loops in spam_prob into one loop, as

def spam_prob(probs):
    inv_prob = prod = 1.0
    for prob in probs:
        prod     *= prob
        inv_prob *= (1 - prob)
    return prod / (prod + inv_prob)


This is twice as fast on my machine for a ten-element list.


-- 
Andrew Henshaw




More information about the Python-list mailing list