[Numpy-discussion] formatting issues, locale and co

David Cournapeau david at ar.media.kyoto-u.ac.jp
Sun Dec 28 01:55:56 EST 2008


Charles R Harris wrote:
>
>
> On Sat, Dec 27, 2008 at 11:46 PM, Robert Kern <robert.kern at gmail.com
> <mailto:robert.kern at gmail.com>> wrote:
>
>     On Sun, Dec 28, 2008 at 01:38, Charles R Harris
>     <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
>     >
>     > On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau
>     > <david at ar.media.kyoto-u.ac.jp
>     <mailto:david at ar.media.kyoto-u.ac.jp>> wrote:
>     >>
>     >> Hi,
>     >>
>     >>    While looking at the last failures of numpy trunk on windows for
>     >> python 2.5 and 2.6, I got into floating point number formatting
>     issues;
>     >> I got deeper and deeper, and now I am lost. We have several
>     problems:
>     >>    - we are not consistent between platforms, nor are we consistent
>     >> with python
>     >>    - str(np.float32(a)) is locale dependent, but python str
>     method is
>     >> not (locale.str is)
>     >>    - formatting of long double does not work on windows because
>     of the
>     >> broken long double support in mingw.
>     >>
>     >> 1 consistency problem:
>     >> ----------------------
>     >>
>     >> python -c "a = 1e20; print a" -> 1e+020
>     >> python26 -c "a = 1e20; print a" -> 1e+20
>     >>
>     >> In numpy, we use PyOS_snprintf for formatting, but python
>     itself uses
>     >> PyOS_ascii_formatd - which has different behavior on different
>     versions
>     >> of python. The above behavior can be simply reproduced in C:
>     >>
>     >> #include <Python.h>
>     >>
>     >> int main()
>     >> {
>     >>    double x = 1e20;
>     >>    char c[200];
>     >>
>     >>    PyOS_ascii_format(c, sizeof(c), "%.12g", x);
>     >>    printf("%s\n", c);
>     >>    printf("%g\n", x);
>     >>
>     >>    return 0;
>     >> }
>     >>
>     >> On 2.5, this will print:
>     >>
>     >> 1e+020
>     >> 1e+020
>     >>
>     >> But on 2.6, this will print:
>     >>
>     >> 1e+20
>     >> 1e+020
>     >>
>     >> 2 locale dependency:
>     >> --------------------
>     >>
>     >> Another issue is that our own formatting is local dependent,
>     whereas
>     >> python isn't:
>     >>
>     >> import numpy as np
>     >> import locale
>     >> locale.setlocale(locale.LC_NUMERIC, 'fr_FR')
>     >> a = 1.2
>     >>
>     >> print "str(a)", str(a)
>     >> print "locale.str(a)", locale.str(a)
>     >> print "str(np.float32(a))", str(np.float32(a))
>     >> print "locale.str(np.float32(a))", locale.str(np.float32(a))
>     >>
>     >> Returns:
>     >>
>     >> str(a) 1.2
>     >> locale.str(a) 1,2
>     >> str(np.float32(a)) 1,2
>     >> locale.str(np.float32(a)) 1,20000004768
>     >>
>     >> I thought about copying the way python does the formatting in
>     the trunk
>     >> (where discrepancies between platforms have been fixed), but
>     this is not
>     >> so easy, because it uses a lot of code from different places -
>     and the
>     >> code needs to be adapted to float and long double. The other
>     solution
>     >> would be to do our own formatting, but this does not sound easy:
>     >> formatting in C is hard. I am not sure about what we should do, if
>     >> anyone else has any idea ?
>     >
>     > I think the first thing to do is make a decision on locale. If
>     we chose to
>     > support locales I don't see much choice but to depend Python
>     because it's
>     > too much work otherwise, and work not directly related to Numpy
>     at that. If
>     > we decide not to support locales then we can do our own
>     formatting if we
>     > need to using a fixed choice of locale. There is a list of snprintf
>     > implementations here. Trio looks like a mature project and has
>     an MIT
>     > license, which I think is a license compatible with Numpy.
>
>     We should not support locales. The string representations of these
>     elements should be Python-parseable.
>
>     > I'm inclined to just fix the locale and ignore the rest until
>     Python gets
>     > things sorted out. But I'm lazy...
>
>     What do you think Python doesn't have sorted out?
>
>
> Consistency between versions and platforms. David's note with the
> ticket points to a Python 3.0 bug on this reported about, oh, two
> years ago.

As an example: in python 2.6, they solved some issues like inf/nan  by
interpreting the strings in python before outputting them, but we do not
use their fix. So we have:

python -c "import numpy as np; print np.log(0)" ->  -inf (python 2.6) /
-1.#INF (2.5, which is the format from the MS runtime).

But:

python -c "import numpy as np; print np.log(0).astype(np.float32)" ->
-1.#INF (both 2.6 and 2.5)

Etc... We can't be consistent with ourselves and with python at the same
time, I think. I don't know which one is best: numpy being consistent
through platforms and python versions, or being consistent with python.

> There is also the problem of long doubles on the windows platform,
> which isn't Python specific since Python doesn't use long doubles. As
> I understand long doubles on windows, mingw32 supports them, VS
> doesn't, so there is a compiler inconsistency to deal with also.

To be exact, both mingw and VS support long double sensu stricto: the
long double type is available. But sizeof(long double) == sizeof(double)
with VS toolchain, and sizeof(long double) is 12 with mingw. The later
is a pain, because mingw use both MS runtime (printf) and its own
function (some math funcs), so we can't easily be consistent (either 8
or 12 bytes long double) with mingw. One solution would be to use the
mingwex printf (a printf reimplementation available  on recent mingwrt)
instead of MSVC runtime - I would hope that this one is fixed wrt long
double. This problem is even worse on 64 bits (long double are 16 bytes
by default there with mingw).

cheers,

David



More information about the NumPy-Discussion mailing list