Be Honest about LC_NUMERIC (PEP proposal), was Re: [Python-Dev] LC_NUMERIC and C libraries

Christian Reis kiko@async.com.br
Mon, 21 Jul 2003 11:10:05 -0300


On Sun, Jul 20, 2003 at 02:29:26PM -0400, Barry Warsaw wrote:
>=20
> Can we perhaps have a PEP for the 2.4 timeframe?

Sure. Reviews would be really appreciated.

PEP: 	XXX
Title: 	Be Honest about LC_NUMERIC (to the C library)
Version: 	$Revision: 1.9 $
Last-Modified: 	$Date: 2002/08/26 16:29:31 $

Author: 	Christian R. Reis <kiko at async.com.br>
Status: 	Draft
Type:       Standards Track
Content-Type: 	text/plain <pep-xxxx.html>
Created: 	19-July-2003
Post-History: =09

------------------------------------------------------------------------

Abstract
  =20
    Support in Python for the LC_NUMERIC locale category is currently
    implemented only in Python-space, which causes inconsistent behavior
    and thread-safety issues for applications that use extension modules
    and libraries implemented in C. This document proposes a plan for
    removing this inconsistency by providing and using substitute
    locale-agnostic functions as necessary.

Introduction

    Python currently provides generic localization services through the
    locale module, which among other things allows localizing the
    display and conversion process of numeric types. Locale categories,
    such as LC_TIME and LC_COLLATE, allow configuring precisely what
    aspects of the application are to be localized.

    The LC_NUMERIC category specifies formatting for non-monetary
    numeric information, such as the decimal separator in float and
    fixed-precision numbers.  Localization of the LC_NUMERIC category is
    currently implemented in only in Python-space; the C libraries are
    unaware of the application's LC_NUMERIC setting. This is done to
    avoid changing the behavior of certain low-level functions that are
    used by the Python parser and related code [2].

    However, this presents a problem for extension modules that wrap C
    libraries; applications that use these extension modules will
    inconsistently display and convert numeric values.=20
   =20
    James Henstridge, the author of PyGTK [3], has additionally pointed
    out that the setlocale() function also presents thread-safety
    issues, since a thread may call the C library setlocale() outside of
    the GIL, and cause Python to function incorrectly.

Rationale

    The inconsistency between Python and C library localization for
    LC_NUMERIC is a problem for any localized application using C
    extensions. The exact nature of the problem will vary depending on
    the application, but it will most likely occur when parsing or
    formatting a numeric value.

Example Problem
   =20
    The initial problem that motivated this PEP is related to the
    GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by PyGTK.
    The widget can be set to numeric mode, and when this occurs,
    characters typed into it are evaluated as a number.=20
   =20
    Because LC_NUMERIC is not set in libc, float values are displayed
    incorrectly, and it is impossible to enter values using the
    localized decimal separator (for instance, `,' for the Brazilian
    locale pt_BR). This small example demonstrates reduced usability
    for localized applications using this toolkit when coded in Python.

Proposal

    Martin V. L=F6wis commented on the initial constraints for an
    acceptable solution to the problem on python-dev:

        - LC_NUMERIC can be set at the C library level without breaking
          the parser.
        - float() and str() stay locale-unaware.

    The following seems to be the current practice:

        - locale-aware str() and float() [XXX: atof(), currently?]
          stay in the locale module.

    An analysis of the Python source suggests that the following
    functions currently depend on LC_NUMERIC being set to the C locale:

        - Python/compile.c:parsenumber()
        - Python/marshal.c:r_object()
        - Objects/complexobject.c:complex_to_buf()
        - Objects/complexobject.c:complex_subtype_from_string()
        - Objects/floatobject.c:PyFloat_FromString()
        - Objects/floatobject.c:format_float()
        - Modules/stropmodule.c:strop_atof()
        - Modules/cPickle.c:load_float()

    [XXX: still need to check if any other occurrences exist]

    The proposed approach is to implement LC_NUMERIC-agnostic functions
    for converting from (strtod()/atof()) and to (snprintf()) float
    formats, using these functions where the formatting should not vary
    according to the user-specified locale.=20
   =20
    This change should also solve the aforementioned thread-safety
    problems.

Potential Code Contributions

    This problem was initially reported as a problem in the GTK+
    libraries [5]; since then it has been correctly diagnosed as an
    inconsistency in Python's implementation. However, in a fortunate
    coincidence, the glib library implements a number of
    LC_NUMERIC-agnostic functions (for an example, see [6]) for reasons
    similar to those presented in this paper. In the same GTK+ problem
    report, Havoc Pennington has suggested that the glib authors would
    be willing to contribute this code to the PSF, which would simplify
    implementation of this PEP considerably.

    [XXX: I believe the code is cross-platform, since glib in part was
    devised to be cross-platform. Needs checking.]

    [XXX: I will check if Alex Larsson is willing to sign the PSF
    contributor agreement [7] to make sure the code is safe to
    integrate.]

Risks

    There may be cross-platform issues with the provided locale-agnostic
    functions. This needs to be tested further.

    Martin has pointed out potential copyright problems with the
    contributed code. I believe we will have no problems in this area as
    members of the GTK+ and glib teams have said they are fine with
    relicensing the code.

Code

    An implementation is being developed by Gustavo Carneiro=20
    <gjc at inescporto.pt>. It is currently attached to Sourceforge.net
    bug 744665 [8]

    [XXX: The SF.net tracker is horrible 8(]

References

    [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
        http://www.python.org/peps/pep-0001.html

    [2] Python locale documentation for embedding,
        http://www.python.org/doc/current/lib/embedding-locale.html

    [3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/

    [4] GtkSpinButton screenshot (demonstrating problem),=20
        http://www.async.com.br/~kiko/spin.png

    [5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=3D114=
132

    [6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
        renamed g_ascii_formatd) by Alex Larsson,=20
        http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg001=
14.html

    [7] PSF Contributor Agreement,=20
        http://www.python.org/psf/psf-contributor-agreement.html

    [8] Python bug report, http://www.python.org/sf/774665

Copyright

    This document has been placed in the public domain.


Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL