(original) (raw)



On Sat, Feb 13, 2016, 00:49 Georg Brandl <g.brandl@gmx.net> wrote:
Hi all,



after talking to Guido and Serhiy we present the next revision

of this PEP. It is a compromise that we are all happy with,

and a relatively restricted rule that makes additions to PEP 8

basically unnecessary.

+1 from me.




I think the discussion has shown that supporting underscores in

the from-string constructors is valuable, therefore this is now

added to the specification section.



The remaining open question is about the reverse direction: do

we want a string formatting modifier that adds underscores as

thousands separators?

+0

Brett




cheers,

Georg



-----------------------------------------------------------------



PEP: 515

Title: Underscores in Numeric Literals

Version: RevisionRevisionRevision

Last-Modified: DateDateDate

Author: Georg Brandl, Serhiy Storchaka

Status: Draft

Type: Standards Track

Content-Type: text/x-rst

Created: 10-Feb-2016

Python-Version: 3.6

Post-History: 10-Feb-2016, 11-Feb-2016



Abstract and Rationale

======================



This PEP proposes to extend Python's syntax and number-from-string

constructors so that underscores can be used as visual separators for

digit grouping purposes in integral, floating-point and complex number

literals.



This is a common feature of other modern languages, and can aid

readability of long literals, or literals whose value should clearly

separate into parts, such as bytes or words in hexadecimal notation.



Examples::



# grouping decimal numbers by thousands

amount = 10_000_000.0



# grouping hexadecimal addresses by words

addr = 0xDEAD_BEEF



# grouping bits into nibbles in a binary literal

flags = 0b_0011_1111_0100_1110



# same, for string conversions

flags = int('0b_1111_0000', 2)





Specification

=============



The current proposal is to allow one underscore between digits, and

after base specifiers in numeric literals. The underscores have no

semantic meaning, and literals are parsed as if the underscores were

absent.



Literal Grammar

---------------



The production list for integer literals would therefore look like

this::



integer: decinteger | bininteger | octinteger | hexinteger

decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*

bininteger: "0" ("b" | "B") (["_"] bindigit)+

octinteger: "0" ("o" | "O") (["_"] octdigit)+

hexinteger: "0" ("x" | "X") (["_"] hexdigit)+

nonzerodigit: "1"..."9"

digit: "0"..."9"

bindigit: "0" | "1"

octdigit: "0"..."7"

hexdigit: digit | "a"..."f" | "A"..."F"



For floating-point and complex literals::



floatnumber: pointfloat | exponentfloat

pointfloat: [digitpart] fraction | digitpart "."

exponentfloat: (digitpart | pointfloat) exponent

digitpart: digit (["_"] digit)*

fraction: "." digitpart

exponent: ("e" | "E") ["+" | "-"] digitpart

imagnumber: (floatnumber | digitpart) ("j" | "J")



Constructors

------------



Following the same rules for placement, underscores will be allowed in

the following constructors:



- ``int()`` (with any base)

- ``float()``

- ``complex()``

- ``Decimal()``





Prior Art

=========



Those languages that do allow underscore grouping implement a large

variety of rules for allowed placement of underscores. In cases where

the language spec contradicts the actual behavior, the actual behavior

is listed. ("single" or "multiple" refer to allowing runs of

consecutive underscores.)



* Ada: single, only between digits [8]_

* C# (open proposal for 7.0): multiple, only between digits [6]_

* C++14: single, between digits (different separator chosen) [1]_

* D: multiple, anywhere, including trailing [2]_

* Java: multiple, only between digits [7]_

* Julia: single, only between digits (but not in float exponent parts)

[9]_

* Perl 5: multiple, basically anywhere, although docs say it's

restricted to one underscore between digits [3]_

* Ruby: single, only between digits (although docs say "anywhere")

[10]_

* Rust: multiple, anywhere, except for between exponent "e" and digits

[4]_

* Swift: multiple, between digits and trailing (although textual

description says only "between digits") [5]_





Alternative Syntax

==================



Underscore Placement Rules

--------------------------



Instead of the relatively strict rule specified above, the use of

underscores could be limited. As we seen from other languages, common

rules include:



* Only one consecutive underscore allowed, and only between digits.

* Multiple consecutive underscores allowed, but only between digits.

* Multiple consecutive underscores allowed, in most positions except

for the start of the literal, or special positions like after a

decimal point.



The syntax in this PEP has ultimately been selected because it covers

the common use cases, and does not allow for syntax that would have to

be discouraged in style guides anyway.



A less common rule would be to allow underscores only every N digits

(where N could be 3 for decimal literals, or 4 for hexadecimal ones).

This is unnecessarily restrictive, especially considering the

separator placement is different in different cultures.



Different Separators

--------------------



A proposed alternate syntax was to use whitespace for grouping.

Although strings are a precedent for combining adjoining literals, the

behavior can lead to unexpected effects which are not possible with

underscores. Also, no other language is known to use this rule,

except for languages that generally disregard any whitespace.



C++14 introduces apostrophes for grouping (because underscores

introduce ambiguity with user-defined literals), which is not

considered because of the use in Python's string literals. [1]_





Open Proposals

==============



It has been proposed [11]_ to extend the number-to-string formatting

language to allow ``_`` as a thousans separator, where currently only

``,`` is supported. This could be used to easily generate code with

more readable literals.





Implementation

==============



A preliminary patch that implements the specification given above has

been posted to the issue tracker. [12]_





References

==========



.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html



.. [2] http://dlang.org/spec/lex.html#integerliteral



.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors



.. [4] http://doc.rust-lang.org/reference.html#number-literals



.. [5]

https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html



.. [6] https://github.com/dotnet/roslyn/issues/216



.. [7]

https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html



.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4



.. [9]

http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/



.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers



.. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html



.. [12] http://bugs.python.org/issue26331





Copyright

=========



This document has been placed in the public domain.



_______________________________________________

Python-Dev mailing list

Python-Dev@python.org

https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org