[Python-Dev] PEP 515: Underscores in Numeric Literals

Thu Feb 11 13:50:09 EST 2016

On 11.02.16 10:22, Georg Brandl wrote:
> Abstract and Rationale
> ======================
>
> This PEP proposes to extend Python's syntax so that underscores can be used in
> integral, floating-point and complex number literals.
>
> This is a common feature of other modern languages, and can aid readability of
> long literals, or literals whose value should clearly separate into parts, such
> as bytes or words in hexadecimal notation.

I have strong preference for more strict and simpler rule, used by most 
other languages -- "only between two digits". Main arguments:

1. Simple rule is easier to understand, remember and recognize. I care 
not about the complexity of the implementation (there is no large 
difference), but about cognitive complexity.

2. Most languages use this rule. It is better to follow non-formal 
standard that invent the rule that differs from rules in every other 
language. This will help programmers that use multiple languages.

I have provided an alternative patch and can provide an alternative PEP 
if it is needed.

> The production list for integer literals would therefore look like this::
>
>     integer: decimalinteger | octinteger | hexinteger | bininteger
>     decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")*
>     nonzerodigit: "1"..."9"
>     digit: "0"..."9"
>     octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")*

     octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)*

>     hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*

     hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)*

>     bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*

     bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*

>     octdigit: "0"..."7"
>     hexdigit: digit | "a"..."f" | "A"..."F"
>     bindigit: "0" | "1"
>
> For floating-point and complex literals::
>
>     floatnumber: pointfloat | exponentfloat
>     pointfloat: [intpart] fraction | intpart "."
>     exponentfloat: (intpart | pointfloat) exponent
>     intpart: digit (digit | "_")*

     intpart: digit (["_"] digit)*

>     fraction: "." intpart
>     exponent: ("e" | "E") ["+" | "-"] intpart
>     imagnumber: (floatnumber | intpart) ("j" | "J")

> **Group 1: liberal**
>
> This group is the least homogeneous: the rules vary slightly between languages.
> All of them allow trailing underscores.  Some allow underscores after non-digits
> like the ``e`` or the sign in exponents.
>
> * D [2]_
> * Perl 5 (underscores basically allowed anywhere, although docs say it's more
>    restricted) [3]_
> * Rust (allows between exponent sign and digits) [4]_
> * Swift (although textual description says "between digits") [5]_
>
> **Group 2: only between digits, multiple consecutive underscores**
>
> * C# (open proposal for 7.0) [6]_
> * Java [7]_
>
> **Group 3: only between digits, only one underscore**
>
> * Ada [8]_
> * Julia (but not in the exponent part of floats) [9]_
> * Ruby (docs say "anywhere", in reality only between digits) [10]_

This classification is misleading. The difference between groups 2 and 3 
is less then between different languages in group 1. To be fair, groups 
2 and 3 should be united in one group. C++ should be included in this 
group. Perl 5 and Swift should be either included in both groups or 
excluded from any group, because they have inconsistencies between the 
documentation and the implementation or between different parts of the 
documentation.

With correct classification it is obvious what variant is the most popular.