[Numpy-discussion] formatting issues, locale and co

Sun Dec 28 01:55:56 EST 2008

Charles R Harris wrote:
>
>
> On Sat, Dec 27, 2008 at 11:46 PM, Robert Kern <robert.kern at gmail.com
> <mailto:robert.kern at gmail.com>> wrote:
>
>     On Sun, Dec 28, 2008 at 01:38, Charles R Harris
>     <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
>     >
>     > On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau
>     > <david at ar.media.kyoto-u.ac.jp
>     <mailto:david at ar.media.kyoto-u.ac.jp>> wrote:
>     >>
>     >> Hi,
>     >>
>     >>    While looking at the last failures of numpy trunk on windows for
>     >> python 2.5 and 2.6, I got into floating point number formatting
>     issues;
>     >> I got deeper and deeper, and now I am lost. We have several
>     problems:
>     >>    - we are not consistent between platforms, nor are we consistent
>     >> with python
>     >>    - str(np.float32(a)) is locale dependent, but python str
>     method is
>     >> not (locale.str is)
>     >>    - formatting of long double does not work on windows because
>     of the
>     >> broken long double support in mingw.
>     >>
>     >> 1 consistency problem:
>     >> ----------------------
>     >>
>     >> python -c "a = 1e20; print a" -> 1e+020
>     >> python26 -c "a = 1e20; print a" -> 1e+20
>     >>
>     >> In numpy, we use PyOS_snprintf for formatting, but python
>     itself uses
>     >> PyOS_ascii_formatd - which has different behavior on different
>     versions
>     >> of python. The above behavior can be simply reproduced in C:
>     >>
>     >> #include <Python.h>
>     >>
>     >> int main()
>     >> {
>     >>    double x = 1e20;
>     >>    char c[200];
>     >>
>     >>    PyOS_ascii_format(c, sizeof(c), "%.12g", x);
>     >>    printf("%s\n", c);
>     >>    printf("%g\n", x);
>     >>
>     >>    return 0;
>     >> }
>     >>
>     >> On 2.5, this will print:
>     >>
>     >> 1e+020
>     >> 1e+020
>     >>
>     >> But on 2.6, this will print:
>     >>
>     >> 1e+20
>     >> 1e+020
>     >>
>     >> 2 locale dependency:
>     >> --------------------
>     >>
>     >> Another issue is that our own formatting is local dependent,
>     whereas
>     >> python isn't:
>     >>
>     >> import numpy as np
>     >> import locale
>     >> locale.setlocale(locale.LC_NUMERIC, 'fr_FR')
>     >> a = 1.2
>     >>
>     >> print "str(a)", str(a)
>     >> print "locale.str(a)", locale.str(a)
>     >> print "str(np.float32(a))", str(np.float32(a))
>     >> print "locale.str(np.float32(a))", locale.str(np.float32(a))
>     >>
>     >> Returns:
>     >>
>     >> str(a) 1.2
>     >> locale.str(a) 1,2
>     >> str(np.float32(a)) 1,2
>     >> locale.str(np.float32(a)) 1,20000004768
>     >>
>     >> I thought about copying the way python does the formatting in
>     the trunk
>     >> (where discrepancies between platforms have been fixed), but
>     this is not
>     >> so easy, because it uses a lot of code from different places -
>     and the
>     >> code needs to be adapted to float and long double. The other
>     solution
>     >> would be to do our own formatting, but this does not sound easy:
>     >> formatting in C is hard. I am not sure about what we should do, if
>     >> anyone else has any idea ?
>     >
>     > I think the first thing to do is make a decision on locale. If
>     we chose to
>     > support locales I don't see much choice but to depend Python
>     because it's
>     > too much work otherwise, and work not directly related to Numpy
>     at that. If
>     > we decide not to support locales then we can do our own
>     formatting if we
>     > need to using a fixed choice of locale. There is a list of snprintf
>     > implementations here. Trio looks like a mature project and has
>     an MIT
>     > license, which I think is a license compatible with Numpy.
>
>     We should not support locales. The string representations of these
>     elements should be Python-parseable.
>
>     > I'm inclined to just fix the locale and ignore the rest until
>     Python gets
>     > things sorted out. But I'm lazy...
>
>     What do you think Python doesn't have sorted out?
>
>
> Consistency between versions and platforms. David's note with the
> ticket points to a Python 3.0 bug on this reported about, oh, two
> years ago.

As an example: in python 2.6, they solved some issues like inf/nan  by
interpreting the strings in python before outputting them, but we do not
use their fix. So we have:

python -c "import numpy as np; print np.log(0)" ->  -inf (python 2.6) /
-1.#INF (2.5, which is the format from the MS runtime).

But:

python -c "import numpy as np; print np.log(0).astype(np.float32)" ->
-1.#INF (both 2.6 and 2.5)

Etc... We can't be consistent with ourselves and with python at the same
time, I think. I don't know which one is best: numpy being consistent
through platforms and python versions, or being consistent with python.

> There is also the problem of long doubles on the windows platform,
> which isn't Python specific since Python doesn't use long doubles. As
> I understand long doubles on windows, mingw32 supports them, VS
> doesn't, so there is a compiler inconsistency to deal with also.

To be exact, both mingw and VS support long double sensu stricto: the
long double type is available. But sizeof(long double) == sizeof(double)
with VS toolchain, and sizeof(long double) is 12 with mingw. The later
is a pain, because mingw use both MS runtime (printf) and its own
function (some math funcs), so we can't easily be consistent (either 8
or 12 bytes long double) with mingw. One solution would be to use the
mingwex printf (a printf reimplementation available  on recent mingwrt)
instead of MSVC runtime - I would hope that this one is fixed wrt long
double. This problem is even worse on 64 bits (long double are 16 bytes
by default there with mingw).

cheers,

David