[Numpy-discussion] formatting issues, locale and co

David Cournapeau david at ar.media.kyoto-u.ac.jp
Sun Dec 28 00:27:07 EST 2008


Hi,

    While looking at the last failures of numpy trunk on windows for
python 2.5 and 2.6, I got into floating point number formatting issues;
I got deeper and deeper, and now I am lost. We have several problems:
    - we are not consistent between platforms, nor are we consistent
with python
    - str(np.float32(a)) is locale dependent, but python str method is
not (locale.str is)
    - formatting of long double does not work on windows because of the
broken long double support in mingw.

1 consistency problem:
----------------------

python -c "a = 1e20; print a" -> 1e+020
python26 -c "a = 1e20; print a" -> 1e+20

In numpy, we use PyOS_snprintf for formatting, but python itself uses
PyOS_ascii_formatd - which has different behavior on different versions
of python. The above behavior can be simply reproduced in C:

#include <Python.h>

int main()
{
    double x = 1e20;
    char c[200];

    PyOS_ascii_format(c, sizeof(c), "%.12g", x);
    printf("%s\n", c);
    printf("%g\n", x);

    return 0;
}

On 2.5, this will print:

1e+020
1e+020

But on 2.6, this will print:

1e+20
1e+020

2 locale dependency:
--------------------

Another issue is that our own formatting is local dependent, whereas
python isn't:

import numpy as np
import locale
locale.setlocale(locale.LC_NUMERIC, 'fr_FR')
a = 1.2

print "str(a)", str(a)
print "locale.str(a)", locale.str(a)
print "str(np.float32(a))", str(np.float32(a))
print "locale.str(np.float32(a))", locale.str(np.float32(a))

Returns:

str(a) 1.2
locale.str(a) 1,2
str(np.float32(a)) 1,2
locale.str(np.float32(a)) 1,20000004768

I thought about copying the way python does the formatting in the trunk
(where discrepancies between platforms have been fixed), but this is not
so easy, because it uses a lot of code from different places - and the
code needs to be adapted to float and long double. The other solution
would be to do our own formatting, but this does not sound easy:
formatting in C is hard. I am not sure about what we should do, if
anyone else has any idea ?

cheers,

David



More information about the NumPy-Discussion mailing list