[Python-Dev] RE: [Spambayes] Question (or possibly a bug report)

Tim Peters tim.one@comcast.net
Thu, 24 Jul 2003 00:34:37 -0400


[Mark Hammond]
> OK, the code now looks like:
>
>         print repr(S), repr(H)
>         S = ln(S) + Sexp * LN2
>         H = ln(H) + Hexp * LN2
>
> And I tested on a hammy mail.  I got:
>
> 3,0955714375167259e-015 0.0
> ...
>   File "E:\src\spambayes\spambayes\classifier.py", line 238, in
> chi2_spamprob
>     H = ln(H) + Hexp * LN2
> exceptions.OverflowError: math range error

So H == 0.0 is the culprit.  Unexpected!

> A spam yields:
> 0.0 0.0
>   File "E:\src\spambayes\spambayes\classifier.py", line 237, in
> chi2_spamprob
>     S = ln(S) + Sexp * LN2
> exceptions.OverflowError: math range error

So S == 0.0 irritated math.log first.  Equally unexpected <wink>.

> Interestingly, S in the first one uses a comma, while all the zeroes
> got '.'
>
> Clueless ly,

Well, the last one is easy:  *Python* adds the dot to 0.  Python's repr()
for floats *generally* acts like C's %.17g, except for

    repr(a_float_that_happens_to_be_an_exect_integer)

plus a couple others you don't want to hear about <wink>.  Then C does

>>> "%.17g" % 0.0
'0'
>>>

and that violates Guido's desire that the *type* of an object be apparent
from its repr.  So Python's format_float (in floatobject.c) first lets C
have a crack at it, and if C's sprintf didn't stick in a radix point, Python
appends its own, plus a trailing zero:

		*cp++ = '.';
		*cp++ = '0';
		*cp++ = '\0';

Back to spambayes, H and S can't become zero <wink>.  The only way they
could is if a computed probability is 0.0 or 1.0, and that's never supposed
to happen.  Printing 'prob' in the loop would tell us whether that's so,
but, if it is so, the true cause could be in a ton of other code.