[Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3) (original) (raw)

Georg Brandl g.brandl at gmx.net
Sat Feb 13 03:48:49 EST 2016


Hi all,

after talking to Guido and Serhiy we present the next revision of this PEP. It is a compromise that we are all happy with, and a relatively restricted rule that makes additions to PEP 8 basically unnecessary.

I think the discussion has shown that supporting underscores in the from-string constructors is valuable, therefore this is now added to the specification section.

The remaining open question is about the reverse direction: do we want a string formatting modifier that adds underscores as thousands separators?

cheers, Georg


PEP: 515 Title: Underscores in Numeric Literals Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Georg Brandl, Serhiy Storchaka Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2016 Python-Version: 3.6 Post-History: 10-Feb-2016, 11-Feb-2016

Abstract and Rationale

This PEP proposes to extend Python's syntax and number-from-string constructors so that underscores can be used as visual separators for digit grouping purposes in integral, floating-point and complex number literals.

This is a common feature of other modern languages, and can aid readability of long literals, or literals whose value should clearly separate into parts, such as bytes or words in hexadecimal notation.

Examples::

# grouping decimal numbers by thousands
amount = 10_000_000.0

# grouping hexadecimal addresses by words
addr = 0xDEAD_BEEF

# grouping bits into nibbles in a binary literal
flags = 0b_0011_1111_0100_1110

# same, for string conversions
flags = int('0b_1111_0000', 2)

Specification

The current proposal is to allow one underscore between digits, and after base specifiers in numeric literals. The underscores have no semantic meaning, and literals are parsed as if the underscores were absent.

Literal Grammar

The production list for integer literals would therefore look like this::

integer: decinteger | bininteger | octinteger | hexinteger decinteger: nonzerodigit ([""] digit)* | "0" ([""] "0")* bininteger: "0" ("b" | "B") ([""] bindigit)+ octinteger: "0" ("o" | "O") ([""] octdigit)+ hexinteger: "0" ("x" | "X") (["_"] hexdigit)+ nonzerodigit: "1"..."9" digit: "0"..."9" bindigit: "0" | "1" octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F"

For floating-point and complex literals::

floatnumber: pointfloat | exponentfloat pointfloat: [digitpart] fraction | digitpart "." exponentfloat: (digitpart | pointfloat) exponent digitpart: digit (["_"] digit)* fraction: "." digitpart exponent: ("e" | "E") ["+" | "-"] digitpart imagnumber: (floatnumber | digitpart) ("j" | "J")

Constructors

Following the same rules for placement, underscores will be allowed in the following constructors:

Prior Art

Those languages that do allow underscore grouping implement a large variety of rules for allowed placement of underscores. In cases where the language spec contradicts the actual behavior, the actual behavior is listed. ("single" or "multiple" refer to allowing runs of consecutive underscores.)

Alternative Syntax

Underscore Placement Rules

Instead of the relatively strict rule specified above, the use of underscores could be limited. As we seen from other languages, common rules include:

The syntax in this PEP has ultimately been selected because it covers the common use cases, and does not allow for syntax that would have to be discouraged in style guides anyway.

A less common rule would be to allow underscores only every N digits (where N could be 3 for decimal literals, or 4 for hexadecimal ones). This is unnecessarily restrictive, especially considering the separator placement is different in different cultures.

Different Separators

A proposed alternate syntax was to use whitespace for grouping. Although strings are a precedent for combining adjoining literals, the behavior can lead to unexpected effects which are not possible with underscores. Also, no other language is known to use this rule, except for languages that generally disregard any whitespace.

C++14 introduces apostrophes for grouping (because underscores introduce ambiguity with user-defined literals), which is not considered because of the use in Python's string literals. [1]_

Open Proposals

It has been proposed [11]_ to extend the number-to-string formatting language to allow _ as a thousans separator, where currently only , is supported. This could be used to easily generate code with more readable literals.

Implementation

A preliminary patch that implements the specification given above has been posted to the issue tracker. [12]_

References

.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

.. [2] http://dlang.org/spec/lex.html#integerliteral

.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors

.. [4] http://doc.rust-lang.org/reference.html#number-literals

.. [5] https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html

.. [6] https://github.com/dotnet/roslyn/issues/216

.. [7] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html

.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4

.. [9] http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/

.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers

.. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html

.. [12] http://bugs.python.org/issue26331

Copyright

This document has been placed in the public domain.



More information about the Python-Dev mailing list