[spambayes-dev] RE: [Spambayes] Question (or possibly a bug report)

Mark Hammond mhammond at skippinet.com.au
Thu Jul 24 14:18:18 EDT 2003


[Moved to spambayes-dev: This will be boring to the poor users.]

[Tim]
> Short course:  The C standard mandates that programs start
> with the "C" locale.  Python's setlocale() wrapper maintains
> that for the LC_NUMERIC category:  no matter what you do from
> within Python, the  LC_NUMERIC category remains in the "C" locale.

Hrm - so that implies our code in msgstore.py:

        # Set our locale to be English, so our plugin works OK
        # ([ spambayes-Bugs-725466 ] Include a proper locale fix in
Options.py
        # has discussion about this problem.)
        locale.setlocale(locale.LC_NUMERIC, "en")

Is probably not doing what we thing it is.  Not that I know what we think it
is doing (but see below).  Fortunately, I can blame Tony for almost all of
these locale fixes.  Of course, I would have done them *exactly* the same
way, but having a scapegoat is still good :) We can't just blame the yanks -
I guess all us "colonials" need to share the blame :)

Googling for this, I found a mail Tim should find either interesting or
terrifying - it mentions not only locale issues, but floating point
exception masks:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=utf-8&threadm=%23MuH%2
3LU8BHA.2344%40tkmsftngp04

Hrm - OK - I bit the bullet, and re-booted as German locale.  If I remove
all calls to setlocale(), I can provoke come *very* strange math errors.
Both:

  File "E:\src\spambayes\Outlook2000\manager.py", line 664, in score
    return self.bayes.spamprob(bayes_tokenize(email), evidence)
  File "E:\src\spambayes\spambayes\classifier.py", line 236, in
chi2_spamprob
    S = ln(S) + Sexp * LN2
exceptions.OverflowError: math range error

I get on *every* mail I try and classify - and in the "training" dialog, I
can reproduce the original bug:

  File "E:\src\spambayes\Outlook2000\dialogs\AsyncDialog.py", line 45, in
set_stages
    assert (abs(start_pos-1.0)) < 0.001, \
AssertionError: Proportions must add to 1.0 (1.0,(('', 1.0),))

Note, however, that in my case, all I need to do is re-enable of our 2
locale.setlocale() calls, and they both work.  I assume the OP would have
the same bug if he managed to be capable of training the system.

I'm starting to get out of my depth, but it seems to me that:
* Not calling setlocale( , "en") will cause strange math errors in all sorts
of strange places (and only God and/or MS know why)
* The setlocale() call appears to not be "sticking" for the people
experiencing the bug, or getting changed by something else *after* the MAPI
logon.

Regarding Tony's later mail:

> Maybe Outlook is at fault here?  I've certainly seen that some of the
> Outlook/COM/MAPI calls make changes to the locale.  In particular,
> mapi.MAPILogonEx() does - it changes the locale to whatever Outlook

I actually think this is a furphy.  I instrumented our MAPI logon code to:
        print "Before init, locale is", locale.getlocale(),
locale.getlocale(locale.LC_NUMERIC)

etc, and this is what I get when running from inside Outlook (and after
removing the apparently redundant setlocale() call in addin.py)

Before init, locale is ['de_DE', '1252'] ['de_DE', '1252']
After init, locale is ['de_DE', '1252'] ['de_DE', '1252']
After logon, locale is ['de_DE', '1252'] ['de_DE', '1252']
[we then call setlocale()]
After setting to 'en', locale is ['de_DE', '1252'] ['English_United States',
'1252']

ie, MAPI changes nothing.  In particular, it did *not* change it from an
English locale.  If I re-add the setlocale() call in addin.py (which happens
before this) I see:

Before init, locale is ['de_DE', '1252'] ['English_United States', '1252']
After init, locale is ['de_DE', '1252'] ['English_United States', '1252']
After logon, locale is ['de_DE', '1252'] ['English_United States', '1252']
After setting to 'en', locale is ['de_DE', '1252'] ['English_United States',
'1252']

Again, MAPI made no change.  But if I start up "manager.py" as a new process
hosted by python.exe, I see:

Before init, locale is (None, None) (None, None)
After init, locale is (None, None) (None, None)
After logon, locale is ['de_DE', '1252'] ['de_DE', '1252']
After setting to 'en', locale is ['de_DE', '1252'] ['English_United States',
'1252']

In this case, mapi *did* set the locale to the default locale - but the
locale that we would have had under Outlook anyway.

So from this, I can not see that we need to change the locale() call after
the logon - MAPI *will* change it when the locale is the "C" locale, but
will never change it once set.  And in the case we care about (running
inside outlook) it will already be set - thus, MAPI will never change it.
If we truly need it 'en', we don't need to make the call after the login, as
MAPI will honour it once set to anything at all.  "All" MAPI will do is set
the process default if it doesn't have one.

Which all leaves me confused, hungry, and with brain ready to explode.  I
would welcome any thoughts on any of this.

Mark.




More information about the spambayes-dev mailing list