[Python-Dev] [Python-ideas] Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)

Raymond Hettinger python at rcn.com
Thu Mar 12 11:41:24 CET 2009


Here's an update incorporating all the comments received so far.

* Put into PEP format
* Fixed typos
* The suggestion for modifying the locale module was dropped.
* The "n" specifier in the local module was referenced
* Fixed minimumwidth --> width
* PERIOD --> DOT
* Added suggestions by Lie Ryan and Eric Smith

-----------------------------------------------------------


PEP: XXX
Title: Format Specifier for Thousands Separator
Version: $Revision$
Last-Modified: $Date$
Author: Raymond Hettinger <python at rcn.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Mar-2009
Post-History: 12-Mar-2009


Motivation
==========

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.

In the finance world, output with commas is the norm.  Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention.  The goal is to make a common task easier
for many users.


Current Version of the Mini-Language
====================================

* `Python 2.6 docs`_

  .. _Python 2.6 docs: http://docs.python.org/library/string.html#formatstrings

* PEP 3101 Advanced String Formatting


Research so far
===============

Scanning the web, I've found that thousands separators are
usually one of COMMA, DOT, SPACE, or UNDERSCORE.
When a COMMA is the decimal separator, the thousands separator
is typically a DOT or SPACE (see examples from Denis Spir).

James Knight observed that Indian/Pakistani numbering systems
group by hundreds.   Ben Finney noted that Chinese group by
ten-thousands.  Eric Smith pointed-out that these are already
handled by the "n" specifier in the locale module (albiet only
for integers).

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like: "_($* #,##0_)".


Proposal I (from Nick Coghlan)
==============================

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][width][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example::

  format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped::

  format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")


Proposal II (to meet Antoine Pitrou's request)
==============================================

Make both the thousands separator and decimal separator user
specifiable but not locale aware.  For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][width][T[tsep]][dsep precision][type]

Examples::

  format(1234, "8.1f")    -->     '  1234.0'
  format(1234, "8,1f")    -->     '  1234,0'
  format(1234, "8T.,1f")  -->     ' 1.234,0'
  format(1234, "8T .f")   -->     ' 1 234,0'
  format(1234, "8d")      -->     '    1234'
  format(1234, "8T,d")    -->     '   1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but it comes at the
expense of being a little more complicated to learn and
remember.  Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

No change is proposed for the local module.


Other Ideas
===========

* Lie Ryan suggested a convenience function of the form::

    create_format(self, type='i', base=16, seppos=4, sep=':', \
                  charset='0123456789abcdef', maxwidth=32,    \
                  minwidth=32, pad='0')

* Eric Smith would like the C version of the mini-language
  parser to be exposed.  That would make it easier to write
  custom __format__ methods.


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: 



More information about the Python-Dev mailing list