cpython: 8a881dafe335 (original) (raw)

--- a/Doc/library/decimal.rst +++ b/Doc/library/decimal.rst @@ -345,7 +345,7 @@ Decimal objects value can be an integer, string, tuple, :class:float, or another :class:Decimal object. If no value is given, returns Decimal('0'). If value is a string, it should conform to the decimal numeric string syntax after leading

and trailing whitespace characters are removed::

and trailing whitespace characters, as well as underscores throughout, are removed:: sign ::= '+' | '-' digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' @@ -394,6 +394,10 @@ Decimal objects :class:float arguments raise an exception if the :exc:FloatOperation trap is set. By default the trap is off.
.. versionchanged:: 3.6

 Underscores are allowed for grouping, as with integral and floating-point[](#l1.17)

```
 literals in code.[](#l1.18)
```

+ Decimal floating point objects share many properties with the other built-in numeric types such as :class:float and :class:int. All of the usual math operations and special methods apply. Likewise, decimal objects can be @@ -1075,8 +1079,8 @@ In addition to the three supplied contex Decimal('4.44') This method implements the to-number operation of the IBM specification.

 If the argument is a string, no leading or trailing whitespace is[](#l1.27)

```
 permitted.[](#l1.28)
```

 If the argument is a string, no leading or trailing whitespace or[](#l1.29)

```
 underscores are permitted.[](#l1.30)
```

.. method:: create_decimal_from_float(f)

--- a/Doc/library/functions.rst +++ b/Doc/library/functions.rst @@ -271,6 +271,9 @@ are always available. They are listed h The complex type is described in :ref:typesnumeric.

.. versionchanged:: 3.6

 Grouping digits with underscores as in code literals is allowed.[](#l2.8)

+ .. function:: delattr(object, name) @@ -531,10 +534,13 @@ are always available. They are listed h The float type is described in :ref:typesnumeric.

.. index::
```
 single: __format__[](#l2.18)
```

 single: string; format() (built-in function)[](#l2.19)

.. versionchanged:: 3.6

 Grouping digits with underscores as in code literals is allowed.[](#l2.21)

+ +.. index::

single: format
single: string; format() (built-in function)

.. function:: format(value[, format_spec]) @@ -702,6 +708,10 @@ are always available. They are listed h :meth:base.__int__ <object.__int__> instead of :meth:base.__index__[](#l2.31) <object.__index__>.

.. versionchanged:: 3.6

 Grouping digits with underscores as in code literals is allowed.[](#l2.35)

+ + .. function:: isinstance(object, classinfo) Return true if the object argument is an instance of the classinfo

--- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -721,20 +721,24 @@ Integer literals Integer literals are described by the following lexical definitions: .. productionlist::

integer: decimalinteger | octinteger | hexinteger | bininteger
decimalinteger: nonzerodigit digit* | "0"+

integer: decinteger | bininteger | octinteger | hexinteger
decinteger: nonzerodigit ([""] digit)* | "0"+ ([""] "0")*
bininteger: "0" ("b" | "B") (["_"] bindigit)+
octinteger: "0" ("o" | "O") (["_"] octdigit)+
hexinteger: "0" ("x" | "X") (["_"] hexdigit)+ nonzerodigit: "1"..."9" digit: "0"..."9"

octinteger: "0" ("o" | "O") octdigit+
hexinteger: "0" ("x" | "X") hexdigit+
bininteger: "0" ("b" | "B") bindigit+

bindigit: "0" | "1" octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F"

bindigit: "0" | "1"

There is no limit for the length of integer literals apart from what can be stored in available memory. +Underscores are ignored for determining the numeric value of the literal. They +can be used to group digits for enhanced readability. One underscore can occur +between digits, and after base specifiers like 0x. + Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0. @@ -743,6 +747,10 @@ Some examples of integer literals:: 7 2147483647 0o177 0b100110111 3 79228162514264337593543950336 0o377 0xdeadbeef

    100_000_000_000                   0b_1110_0101[](#l3.38)

+ +.. versionchanged:: 3.6

Underscores are now allowed for grouping purposes in literals.

.. _floating: @@ -754,23 +762,28 @@ Floating point literals are described by .. productionlist:: floatnumber: pointfloat | exponentfloat

pointfloat: [intpart] fraction | intpart "."
exponentfloat: (intpart | pointfloat) exponent
intpart: digit+
fraction: "." digit+
exponent: ("e" | "E") ["+" | "-"] digit+

pointfloat: [digitpart] fraction | digitpart "."
exponentfloat: (digitpart | pointfloat) exponent
digitpart: digit (["_"] digit)*
fraction: "." digitpart
exponent: ("e" | "E") ["+" | "-"] digitpart

Note that the integer and exponent parts are always interpreted using radix 10. For example, 077e010 is legal, and denotes the same number as 77e10. The -allowed range of floating point literals is implementation-dependent. Some -examples of floating point literals:: +allowed range of floating point literals is implementation-dependent. As in +integer literals, underscores are supported for digit grouping.

3.14 10. .001 1e100 3.14e-10 0e0 +Some examples of floating point literals:: +

3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93

Note that numeric literals do not include a sign; a phrase like -1 is actually an expression composed of the unary operator - and the literal 1. +.. versionchanged:: 3.6

Underscores are now allowed for grouping purposes in literals. +

.. _imaginary: @@ -780,7 +793,7 @@ Imaginary literals Imaginary literals are described by the following lexical definitions: .. productionlist::

imagnumber: (floatnumber | intpart) ("j" | "J")

imagnumber: (floatnumber | digitpart) ("j" | "J")

An imaginary literal yields a complex number with a real part of 0.0. Complex numbers are represented as a pair of floating point numbers and have the same @@ -788,7 +801,7 @@ restrictions on their range. To create part, add a floating point number to it, e.g., (3+4j). Some examples of imaginary literals::

3.14j 10.j 10j .001j 1e100j 3.14e-10j

3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j

.. _operators:

--- a/Doc/whatsnew/3.6.rst +++ b/Doc/whatsnew/3.6.rst @@ -124,6 +124,29 @@ Windows improvements: New Features ============ +.. _pep-515: + +PEP 515: Underscores in Numeric Literals +======================================== + +Prior to PEP 515, there was no support for writing long numeric +literals with some form of separator to improve readability. For +instance, how big is 1000000000000000```? With :pep:`515`, though,[](#l4.14) +you can use underscores to separate digits as desired to make numeric[](#l4.15) +literals easier to read: 1_000_000_000_000_000. Underscores can be[](#l4.16) +used with other numeric literals beyond integers, e.g.[](#l4.17) +0x_FF_FF_FF_FF``. + +Single underscores are allowed between digits and after any base +specifier. More than a single underscore in a row, leading, or +trailing underscores are not allowed. + +.. seealso:: +

:pep:523 - Underscores in Numeric Literals
PEP written by Georg Brandl & Serhiy Storchaka. +

+ .. _pep-523: PEP 523: Adding a frame evaluation API to CPython

--- a/Include/pystrtod.h +++ b/Include/pystrtod.h @@ -19,6 +19,10 @@ PyAPI_FUNC(char *) PyOS_double_to_string int *type); #ifndef Py_LIMITED_API +PyAPI_FUNC(PyObject *) _Py_string_to_number_with_underscores(

const char *str, Py_ssize_t len, const char *what, PyObject *obj, void *arg,
PyObject *(*innerfunc)(const char *, Py_ssize_t, void *));

+ PyAPI_FUNC(double) _Py_parse_inf_or_nan(const char *p, char **endptr); #endif

--- a/Lib/_pydecimal.py +++ b/Lib/_pydecimal.py @@ -589,7 +589,7 @@ class Decimal(object): # From a string # REs insist on real strings, so we can too. if isinstance(value, str):

       m = _parser(value.strip())[](#l6.7)

       m = _parser(value.strip().replace("_", ""))[](#l6.8)
       if m is None:[](#l6.9)
           if context is None:[](#l6.10)
               context = getcontext()[](#l6.11)

@@ -4125,7 +4125,7 @@ class Context(object): This will make it round up for that operation. """ rounding = self.rounding

```
   self.rounding= type[](#l6.16)
```

   self.rounding = type[](#l6.17)
   return rounding[](#l6.18)

def create_decimal(self, num='0'): @@ -4134,10 +4134,10 @@ class Context(object): This method implements the to-number operation of the IBM Decimal specification."""

   if isinstance(num, str) and num != num.strip():[](#l6.25)

   if isinstance(num, str) and (num != num.strip() or '_' in num):[](#l6.26)
       return self._raise_error(ConversionSyntax,[](#l6.27)

                                "no trailing or leading whitespace is "[](#l6.28)

                                "permitted.")[](#l6.29)

                                "trailing or leading whitespace and "[](#l6.30)

                                "underscores are not permitted.")[](#l6.31)

d = Decimal(num, context=self) if d._isnan() and len(d._int) > self.prec - self.clamp:

--- a/Lib/test/test_complex.py +++ b/Lib/test/test_complex.py @@ -1,5 +1,7 @@ import unittest from test import support +from test.test_grammar import (VALID_UNDERSCORE_LITERALS,

                          INVALID_UNDERSCORE_LITERALS)[](#l7.7)

from random import random from math import atan2, isnan, copysign @@ -377,6 +379,18 @@ class ComplexTest(unittest.TestCase): self.assertAlmostEqual(complex(complex1(1j)), 2j) self.assertRaises(TypeError, complex, complex2(1j))

def test_underscores(self):
```
   # check underscores[](#l7.16)
```

   for lit in VALID_UNDERSCORE_LITERALS:[](#l7.17)

       if not any(ch in lit for ch in 'xXoObB'):[](#l7.18)

           self.assertEqual(complex(lit), eval(lit))[](#l7.19)

           self.assertEqual(complex(lit), complex(lit.replace('_', '')))[](#l7.20)

   for lit in INVALID_UNDERSCORE_LITERALS:[](#l7.21)

       if lit in ('0_7', '09_99'):  # octals are not recognized here[](#l7.22)

```
           continue[](#l7.23)
```

       if not any(ch in lit for ch in 'xXoObB'):[](#l7.24)

           self.assertRaises(ValueError, complex, lit)[](#l7.25)

+ def test_hash(self): for x in range(-30, 30): self.assertEqual(hash(x), hash(complex(x, 0)))

--- a/Lib/test/test_decimal.py +++ b/Lib/test/test_decimal.py @@ -554,6 +554,10 @@ class ExplicitConstructionTest(unittest. self.assertEqual(str(Decimal(' -7.89')), '-7.89') self.assertEqual(str(Decimal(" 3.45679 ")), '3.45679')

```
   # underscores[](#l8.7)
```

   self.assertEqual(str(Decimal('1_3.3e4_0')), '1.33E+41')[](#l8.8)

   self.assertEqual(str(Decimal('1_0_0_0')), '1000')[](#l8.9)

+ # unicode whitespace for lead in ["", ' ', '\u00a0', '\u205f']: for trail in ["", ' ', '\u00a0', '\u205f']: @@ -578,6 +582,9 @@ class ExplicitConstructionTest(unittest. # embedded NUL self.assertRaises(InvalidOperation, Decimal, "12\u00003")

       # underscores don't prevent errors[](#l8.18)

       self.assertRaises(InvalidOperation, Decimal, "1_2_\u00003")[](#l8.19)

+ @cpython_only def test_from_legacy_strings(self): import _testcapi @@ -772,6 +779,9 @@ class ExplicitConstructionTest(unittest. self.assertRaises(InvalidOperation, nc.create_decimal, "xyz") self.assertRaises(ValueError, nc.create_decimal, (1, "xyz", -25)) self.assertRaises(TypeError, nc.create_decimal, "1234", "5678")

   # no whitespace and underscore stripping is done with this method[](#l8.28)

   self.assertRaises(InvalidOperation, nc.create_decimal, " 1234")[](#l8.29)

   self.assertRaises(InvalidOperation, nc.create_decimal, "12_34")[](#l8.30)

# too many NaN payload digits nc.prec = 3

--- a/Lib/test/test_float.py +++ b/Lib/test/test_float.py @@ -1,4 +1,3 @@ - import fractions import operator import os @@ -9,6 +8,8 @@ import time import unittest from test import support +from test.test_grammar import (VALID_UNDERSCORE_LITERALS,

                          INVALID_UNDERSCORE_LITERALS)[](#l9.13)

from math import isinf, isnan, copysign, ldexp INF = float("inf") @@ -60,6 +61,27 @@ class GeneralFloatCases(unittest.TestCas float(b'.' + b'1'*1000) float('.' + '1'*1000)

def test_underscores(self):

   for lit in VALID_UNDERSCORE_LITERALS:[](#l9.22)

       if not any(ch in lit for ch in 'jJxXoObB'):[](#l9.23)

           self.assertEqual(float(lit), eval(lit))[](#l9.24)

           self.assertEqual(float(lit), float(lit.replace('_', '')))[](#l9.25)

   for lit in INVALID_UNDERSCORE_LITERALS:[](#l9.26)

       if lit in ('0_7', '09_99'):  # octals are not recognized here[](#l9.27)

```
           continue[](#l9.28)
```

       if not any(ch in lit for ch in 'jJxXoObB'):[](#l9.29)

           self.assertRaises(ValueError, float, lit)[](#l9.30)

   # Additional test cases; nan and inf are never valid as literals,[](#l9.31)

   # only in the float() constructor, but we don't allow underscores[](#l9.32)

```
   # in or around them.[](#l9.33)
```

   self.assertRaises(ValueError, float, '_NaN')[](#l9.34)

   self.assertRaises(ValueError, float, 'Na_N')[](#l9.35)

   self.assertRaises(ValueError, float, 'IN_F')[](#l9.36)

   self.assertRaises(ValueError, float, '-_INF')[](#l9.37)

   self.assertRaises(ValueError, float, '-INF_')[](#l9.38)

   # Check that we handle bytes values correctly.[](#l9.39)

   self.assertRaises(ValueError, float, b'0_.\xff9')[](#l9.40)

+ def test_non_numeric_input_types(self): # Test possible non-numeric types for the argument x, including # subclasses of the explicitly documented accepted types.

--- a/Lib/test/test_grammar.py +++ b/Lib/test/test_grammar.py @@ -16,6 +16,87 @@ from collections import ChainMap from test import ann_module2 import test +# These are shared with test_tokenize and other test modules. +# +# Note: since several test cases filter out floats by looking for "e" and ".", +# don't add hexadecimal literals that contain "e" or "E". +VALID_UNDERSCORE_LITERALS = [

'0_0_0',
'4_2',
'1_0000_0000',
'0b1001_0100',
'0xffff_ffff',
'0o5_7_7',
'1_00_00.5',
'1_00_00.5e5',
'1_00_00e5_1',
'1e1_0',
'.1_4',
'.1_4e1',
'0b_0',
'0x_f',
'0o_5',
'1_00_00j',
'1_00_00.5j',
'1_00_00e5_1j',
'.1_4j',
'(1_2.5+3_3j)',
'(.5_6j)',

+] +INVALID_UNDERSCORE_LITERALS = [

Trailing underscores:
'0_',
'42_',
'1.4j_',
'0x_',
'0b1_',
'0xf_',
'0o5_',
'0 if 1_Else 1',
Underscores in the base selector:
'0_b0',
'0_xf',
'0_o5',
Old-style octal, still disallowed:
'0_7',
'09_99',
Multiple consecutive underscores:
'4_______2',
'0.1__4',
'0.1__4j',
'0b1001__0100',
'0xffff__ffff',
'0x___',
'0o5__77',
'1e1__0',
'1e1__0j',
Underscore right before a dot:
'1_.4',
'1_.4j',
Underscore right after a dot:
'1._4',
'1._4j',
'._5',
'._5j',
Underscore right after a sign:
'1.0e+_1',
'1.0e+_1j',
Underscore right before j:
'1.4_j',
'1.4e5_j',
Underscore right before e:
'1_e1',
'1.4_e1',
'1.4_e1j',
Underscore right after e:
'1e_1',
'1.4e_1',
'1.4e_1j',
Complex cases with parens:
'(1+1.5_j_)',
'(1+1.5_j)',

+] + class TokenTests(unittest.TestCase): @@ -95,6 +176,14 @@ class TokenTests(unittest.TestCase): self.assertEqual(1 if 0else 0, 0) self.assertRaises(SyntaxError, eval, "0 if 1Else 0")

def test_underscore_literals(self):

   for lit in VALID_UNDERSCORE_LITERALS:[](#l10.96)

       self.assertEqual(eval(lit), eval(lit.replace('_', '')))[](#l10.97)

   for lit in INVALID_UNDERSCORE_LITERALS:[](#l10.98)

       self.assertRaises(SyntaxError, eval, lit)[](#l10.99)

   # Sanity check: no literal begins with an underscore[](#l10.100)

   self.assertRaises(NameError, eval, "_0")[](#l10.101)

+ def test_string_literals(self): x = ''; y = ""; self.assertTrue(len(x) == 0 and x == y) x = '''; y = "'"; self.assertTrue(len(x) == 1 and x == y and ord(x) == 39)

--- a/Lib/test/test_int.py +++ b/Lib/test/test_int.py @@ -2,6 +2,8 @@ import sys import unittest from test import support +from test.test_grammar import (VALID_UNDERSCORE_LITERALS,

                          INVALID_UNDERSCORE_LITERALS)[](#l11.8)

L = [ ('0', 0), @@ -212,6 +214,25 @@ class IntTestCases(unittest.TestCase): self.assertEqual(int('2br45qc', 35), 4294967297) self.assertEqual(int('1z141z5', 36), 4294967297)

def test_underscores(self):

   for lit in VALID_UNDERSCORE_LITERALS:[](#l11.17)

       if any(ch in lit for ch in '.eEjJ'):[](#l11.18)

```
           continue[](#l11.19)
```

       self.assertEqual(int(lit, 0), eval(lit))[](#l11.20)

       self.assertEqual(int(lit, 0), int(lit.replace('_', ''), 0))[](#l11.21)

   for lit in INVALID_UNDERSCORE_LITERALS:[](#l11.22)

       if any(ch in lit for ch in '.eEjJ'):[](#l11.23)

```
           continue[](#l11.24)
```

       self.assertRaises(ValueError, int, lit, 0)[](#l11.25)

   # Additional test cases with bases != 0, only for the constructor:[](#l11.26)

   self.assertEqual(int("1_00", 3), 9)[](#l11.27)

   self.assertEqual(int("0_100"), 100)  # not valid as a literal![](#l11.28)

   self.assertEqual(int(b"1_00"), 100)  # byte underscore[](#l11.29)

   self.assertRaises(ValueError, int, "_100")[](#l11.30)

   self.assertRaises(ValueError, int, "+_100")[](#l11.31)

   self.assertRaises(ValueError, int, "1__00")[](#l11.32)

   self.assertRaises(ValueError, int, "100_")[](#l11.33)

+ @support.cpython_only def test_small_ints(self): # Bug #3236: Return small longs from PyLong_FromString

--- a/Lib/test/test_tokenize.py +++ b/Lib/test/test_tokenize.py @@ -3,7 +3,9 @@ from tokenize import (tokenize, _tokeniz STRING, ENDMARKER, ENCODING, tok_name, detect_encoding, open as tokenize_open, Untokenizer) from io import BytesIO -from unittest import TestCase, mock, main +from unittest import TestCase, mock +from test.test_grammar import (VALID_UNDERSCORE_LITERALS,

                          INVALID_UNDERSCORE_LITERALS)[](#l12.10)

import os import token @@ -185,6 +187,21 @@ def k(x): NUMBER '3.14e159' (1, 4) (1, 12) """)

def test_underscore_literals(self):
```
   def number_token(s):[](#l12.19)
```

       f = BytesIO(s.encode('utf-8'))[](#l12.20)

       for toktype, token, start, end, line in tokenize(f.readline):[](#l12.21)

           if toktype == NUMBER:[](#l12.22)

```
               return token[](#l12.23)
```

       return 'invalid token'[](#l12.24)

   for lit in VALID_UNDERSCORE_LITERALS:[](#l12.25)

```
       if '(' in lit:[](#l12.26)
```

           # this won't work with compound complex inputs[](#l12.27)

```
           continue[](#l12.28)
```

       self.assertEqual(number_token(lit), lit)[](#l12.29)

   for lit in INVALID_UNDERSCORE_LITERALS:[](#l12.30)

       self.assertNotEqual(number_token(lit), lit)[](#l12.31)

+ def test_string(self): # String literals self.check_tokenize("x = ''; y = """, """[](#l12.35) @@ -1529,11 +1546,10 @@ class TestRoundtrip(TestCase): tempdir = os.path.dirname(fn) or os.curdir testfiles = glob.glob(os.path.join(tempdir, "test*.py"))

   # Tokenize is broken on test_unicode_identifiers.py because regular[](#l12.40)

   # expressions are broken on the obscure unicode identifiers in it.[](#l12.41)

   # *sigh* With roundtrip extended to test the 5-tuple mode of[](#l12.42)

   # untokenize, 7 more testfiles fail.  Remove them also until the[](#l12.43)

```
   # failure is diagnosed.[](#l12.44)
```

   # Tokenize is broken on test_pep3131.py because regular expressions are[](#l12.45)

   # broken on the obscure unicode identifiers in it. *sigh*[](#l12.46)

   # With roundtrip extended to test the 5-tuple mode of untokenize,[](#l12.47)

   # 7 more testfiles fail.  Remove them also until the failure is diagnosed.[](#l12.48)

testfiles.remove(os.path.join(tempdir, "test_unicode_identifiers.py")) for f in ('buffer', 'builtin', 'fileio', 'inspect', 'os', 'platform', 'sys'): @@ -1565,4 +1581,4 @@ class TestRoundtrip(TestCase): if name == "main":

main()

unittest.main()

--- a/Lib/test/test_types.py +++ b/Lib/test/test_types.py @@ -48,6 +48,7 @@ class TypesTests(unittest.TestCase): def test_float_constructor(self): self.assertRaises(ValueError, float, '') self.assertRaises(ValueError, float, '5\0')

   self.assertRaises(ValueError, float, '5_5\0')[](#l13.7)

def test_zero_division(self): try: 5.0 / 0.0

--- a/Lib/tokenize.py +++ b/Lib/tokenize.py @@ -120,16 +120,17 @@ Comment = r'#[^\r\n]' Ignore = Whitespace + any(r'\\r?\n' + Whitespace) + maybe(Comment) Name = r'\w+' -Hexnumber = r'0[xX][0-9a-fA-F]+' -Binnumber = r'0[bB][01]+' -Octnumber = r'0[oO][0-7]+' -Decnumber = r'(?:0+|[1-9][0-9])' +Hexnumber = r'0xX+' +Binnumber = r'0bB+' +Octnumber = r'0oO+' +Decnumber = r'(?:0(?:_?0)|1-9)' Intnumber = group(Hexnumber, Binnumber, Octnumber, Decnumber) -Exponent = r'[eE][-+]?[0-9]+' -Pointfloat = group(r'[0-9]+.[0-9]', r'.[0-9]+') + maybe(Exponent) -Expfloat = r'[0-9]+' + Exponent +Exponent = r'[eE][-+]?0-9' +Pointfloat = group(r'0-9.(?:0-9)?',

              r'\.[0-9](?:_?[0-9])*') + maybe(Exponent)[](#l14.21)

+Expfloat = r'0-9' + Exponent Floatnumber = group(Pointfloat, Expfloat) -Imagnumber = group(r'[0-9]+[jJ]', Floatnumber + r'[jJ]') +Imagnumber = group(r'0-9[jJ]', Floatnumber + r'[jJ]') Number = group(Imagnumber, Floatnumber, Intnumber)

Return the empty string, plus all of the valid string prefixes.

--- a/Misc/NEWS +++ b/Misc/NEWS @@ -17,6 +17,8 @@ Core and Builtins efficient bytecode. Patch by Demur Rumed, design by Serhiy Storchaka, reviewed by Serhiy Storchaka and Victor Stinner. +- Issue #26331: Implement tokenizing support for PEP 515. Patch by Georg Brandl. +

Issue #27999: Make "global after use" a SyntaxError, and ditto for nonlocal. Patch by Ivan Levkivskyi. @@ -2678,7 +2680,7 @@ Library
Issue #24774: Fix docstring in http.server.test. Patch from Chiu-Hsiang Hsu.
Issue #21159: Improve message in configparser.InterpolationMissingOptionError.
Patch from �?ukasz Langa.

Patch from �?ukasz Langa.

Issue #20362: Honour TestCase.longMessage correctly in assertRegex. Patch from Ilia Kurenkov. @@ -4606,7 +4608,7 @@ Library Based on patch by Martin Panter.
Issue #17293: uuid.getnode() now determines MAC address on AIX using netstat.
Based on patch by Aivars Kalv�?ns.

Based on patch by Aivars Kalv�?ns.

Issue #22769: Fixed ttk.Treeview.tag_has() when called without arguments.

--- a/Modules/_decimal/_decimal.c +++ b/Modules/_decimal/_decimal.c @@ -1889,12 +1889,13 @@ is_space(enum PyUnicode_Kind kind, void /* Return the ASCII representation of a numeric Unicode string. The numeric string may contain ascii characters in the range [1, 127], any Unicode space and any unicode digit. If strip_ws is true, leading and trailing

whitespace is stripped.

whitespace is stripped. If ignore_underscores is true, underscores are
ignored. Return NULL if malloc fails and an empty string if invalid characters are found. */ static char * -numeric_as_ascii(const PyObject *u, int strip_ws) +numeric_as_ascii(const PyObject *u, int strip_ws, int ignore_underscores) { enum PyUnicode_Kind kind; void *data; @@ -1929,6 +1930,9 @@ numeric_as_ascii(const PyObject *u, int for (; j < len; j++) { ch = PyUnicode_READ(kind, data, j);

   if (ignore_underscores && ch == '_') {[](#l16.23)

```
       continue;[](#l16.24)
```

   }[](#l16.25)
   if (0 < ch && ch <= 127) {[](#l16.26)
       *cp++ = ch;[](#l16.27)
       continue;[](#l16.28)

@@ -2011,7 +2015,7 @@ PyDecType_FromUnicode(PyTypeObject *type PyObject *dec; char *s;

s = numeric_as_ascii(u, 0);

s = numeric_as_ascii(u, 0, 0); if (s == NULL) { return NULL; } @@ -2031,7 +2035,7 @@ PyDecType_FromUnicodeExactWS(PyTypeObjec PyObject *dec; char *s;

s = numeric_as_ascii(u, 1);

s = numeric_as_ascii(u, 1, 1); if (s == NULL) { return NULL; }

--- a/Objects/complexobject.c +++ b/Objects/complexobject.c @@ -759,29 +759,12 @@ static PyMemberDef complex_members[] = { }; static PyObject * -complex_subtype_from_string(PyTypeObject *type, PyObject *v) +complex_from_string_inner(const char *s, Py_ssize_t len, void *type) {

const char *s, *start;
char *end; double x=0.0, y=0.0, z; int got_bracket=0;
PyObject *s_buffer = NULL;
Py_ssize_t len;

if (PyUnicode_Check(v)) {

   s_buffer = _PyUnicode_TransformDecimalAndSpaceToASCII(v);[](#l17.18)

```
   if (s_buffer == NULL)[](#l17.19)
```
```
       return NULL;[](#l17.20)
```

   s = PyUnicode_AsUTF8AndSize(s_buffer, &len);[](#l17.21)

```
   if (s == NULL)[](#l17.22)
```
```
       goto error;[](#l17.23)
```
}
else {

   PyErr_Format(PyExc_TypeError,[](#l17.26)

       "complex() argument must be a string or a number, not '%.200s'",[](#l17.27)

```
       Py_TYPE(v)->tp_name);[](#l17.28)
```
```
   return NULL;[](#l17.29)
```
}

const char *start;
char *end;

/* position on first nonblank */ start = s; @@ -822,7 +805,7 @@ complex_subtype_from_string(PyTypeObject if (PyErr_ExceptionMatches(PyExc_ValueError)) PyErr_Clear(); else

```
       goto error;[](#l17.40)
```

```
       return NULL;[](#l17.41)
```
} if (end != s) { /* all 4 forms starting with land here */

@@ -835,7 +818,7 @@ complex_subtype_from_string(PyTypeObject if (PyErr_ExceptionMatches(PyExc_ValueError)) PyErr_Clear(); else

```
               goto error;[](#l17.49)
```

               return NULL;[](#l17.50)
       }[](#l17.51)
       if (end != s)[](#l17.52)
           /* <float><signed-float>j */[](#l17.53)

@@ -890,18 +873,46 @@ complex_subtype_from_string(PyTypeObject if (s-start != len) goto parse_error;

Py_XDECREF(s_buffer);
return complex_subtype_from_doubles(type, x, y);

return complex_subtype_from_doubles((PyTypeObject *)type, x, y);

parse_error: PyErr_SetString(PyExc_ValueError, "complex() arg is a malformed string");

error:
Py_XDECREF(s_buffer); return NULL; }

static PyObject * +complex_subtype_from_string(PyTypeObject *type, PyObject *v) +{

const char *s;
PyObject *s_buffer = NULL, *result = NULL;
Py_ssize_t len;

if (PyUnicode_Check(v)) {

   s_buffer = _PyUnicode_TransformDecimalAndSpaceToASCII(v);[](#l17.78)

```
   if (s_buffer == NULL) {[](#l17.79)
```
```
       return NULL;[](#l17.80)
```
```
   }[](#l17.81)
```

   s = PyUnicode_AsUTF8AndSize(s_buffer, &len);[](#l17.82)

```
   if (s == NULL) {[](#l17.83)
```
```
       goto exit;[](#l17.84)
```
```
   }[](#l17.85)
```
}
else {

   PyErr_Format(PyExc_TypeError,[](#l17.88)

       "complex() argument must be a string or a number, not '%.200s'",[](#l17.89)

```
       Py_TYPE(v)->tp_name);[](#l17.90)
```
```
   return NULL;[](#l17.91)
```
}

result = _Py_string_to_number_with_underscores(s, len, "complex", v, type,

                                              complex_from_string_inner);[](#l17.95)

exit:
Py_DECREF(s_buffer);
return result;

+} + +static PyObject * complex_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { PyObject *r, *i, *tmp;

--- a/Objects/floatobject.c +++ b/Objects/floatobject.c @@ -124,11 +124,43 @@ PyFloat_FromDouble(double fval) return (PyObject *) op; } +static PyObject * +float_from_string_inner(const char *s, Py_ssize_t len, void *obj) +{

double x;
const char *end;
const char *last = s + len;
/* strip space */
while (s < last && Py_ISSPACE(*s)) {
```
   s++;[](#l18.15)
```
}

while (s < last - 1 && Py_ISSPACE(last[-1])) {
```
   last--;[](#l18.19)
```
}

/* We don't care about overflow or underflow. If the platform

* supports them, infinities and signed zeroes (on underflow) are[](#l18.23)

```
* fine. */[](#l18.24)
```
x = PyOS_string_to_double(s, (char **)&end, NULL);
if (end != last) {

   PyErr_Format(PyExc_ValueError,[](#l18.27)

                "could not convert string to float: "[](#l18.28)

```
                "%R", obj);[](#l18.29)
```
```
   return NULL;[](#l18.30)
```
}
else if (x == -1.0 && PyErr_Occurred()) {
```
   return NULL;[](#l18.33)
```
}
else {

   return PyFloat_FromDouble(x);[](#l18.36)

}

+} + PyObject * PyFloat_FromString(PyObject *v) {

const char *s, *last, *end;
double x;

const char *s; PyObject *s_buffer = NULL; Py_ssize_t len; Py_buffer view = {NULL, NULL}; @@ -169,27 +201,8 @@ PyFloat_FromString(PyObject *v) Py_TYPE(v)->tp_name); return NULL; }

last = s + len;
/* strip space */
while (s < last && Py_ISSPACE(*s))
```
   s++;[](#l18.56)
```
while (s < last - 1 && Py_ISSPACE(last[-1]))
```
   last--;[](#l18.58)
```
/* We don't care about overflow or underflow. If the platform

* supports them, infinities and signed zeroes (on underflow) are[](#l18.60)

```
* fine. */[](#l18.61)
```
x = PyOS_string_to_double(s, (char **)&end, NULL);
if (end != last) {

   PyErr_Format(PyExc_ValueError,[](#l18.64)

                "could not convert string to float: "[](#l18.65)

```
                "%R", v);[](#l18.66)
```
```
   result = NULL;[](#l18.67)
```
}
else if (x == -1.0 && PyErr_Occurred())
```
   result = NULL;[](#l18.70)
```
else

   result = PyFloat_FromDouble(x);[](#l18.72)

result = _Py_string_to_number_with_underscores(s, len, "float", v, v,

                                              float_from_string_inner);[](#l18.75)

PyBuffer_Release(&view); Py_XDECREF(s_buffer); return result;

--- a/Objects/longobject.c +++ b/Objects/longobject.c @@ -2004,12 +2004,18 @@ unsigned char _PyLong_DigitValue[256] =

non-digit (which may be *str!). A normalized int is returned.
The point to this routine is that it takes time linear in the number of
string characters.

*
- Return values:
- -1 on syntax error (exception needs to be set, *res is untouched)
- 0 else (exception may be set, in that case *res is set to NULL) */ -static PyLongObject * -long_from_binary_base(const char **str, int base) +static int +long_from_binary_base(const char **str, int base, PyLongObject **res) { const char *p = *str; const char *start = p;
char prev = 0;
int digits = 0; int bits_per_char; Py_ssize_t n; PyLongObject *z; @@ -2019,23 +2025,43 @@ long_from_binary_base(const char **str, assert(base >= 2 && base <= 32 && (base & (base - 1)) == 0); n = base;

for (bits_per_char = -1; n; ++bits_per_char)

for (bits_per_char = -1; n; ++bits_per_char) { n >>= 1;

/* n <- total # of bits needed, while setting p to end-of-string */
while (_PyLong_DigitValue[Py_CHARMASK(*p)] < base)

}
/* count digits and set p to end-of-string */
while (PyLong_DigitValue[Py_CHARMASK(*p)] < base || *p == '') {
```
   if (*p == '_') {[](#l19.36)
```
```
       if (prev == '_') {[](#l19.37)
```
```
           *str = p - 1;[](#l19.38)
```
```
           return -1;[](#l19.39)
```
```
       }[](#l19.40)
```
```
   } else {[](#l19.41)
```
```
       ++digits;[](#l19.42)
```
```
   }[](#l19.43)
```

   prev = *p;[](#l19.44)
   ++p;[](#l19.45)

}
if (prev == '_') {

   /* Trailing underscore not allowed. */[](#l19.48)

```
   *str = p - 1;[](#l19.49)
```
```
   return -1;[](#l19.50)
```
}

+ str = p; / n <- # of Python digits needed, = ceiling(n/PyLong_SHIFT). */

n = (p - start) * bits_per_char + PyLong_SHIFT - 1;

n = digits * bits_per_char + PyLong_SHIFT - 1; if (n / bits_per_char < p - start) { PyErr_SetString(PyExc_ValueError, "int string too large to convert");

```
   return NULL;[](#l19.60)
```

```
   *res = NULL;[](#l19.61)
```
```
   return 0;[](#l19.62)
```
} n = n / PyLong_SHIFT; z = _PyLong_New(n);

if (z == NULL)
```
   return NULL;[](#l19.67)
```

if (z == NULL) {
```
   *res = NULL;[](#l19.69)
```
```
   return 0;[](#l19.70)
```
} /* Read string from right, and fill in int from left; i.e.,
- from least to most significant in both. */ @@ -2043,7 +2069,11 @@ long_from_binary_base(const char **str, bits_in_accum = 0; pdigit = z->ob_digit; while (--p >= start) {

   int k = (int)_PyLong_DigitValue[Py_CHARMASK(*p)];[](#l19.79)

```
   int k;[](#l19.80)
```
```
   if (*p == '_') {[](#l19.81)
```
```
       continue;[](#l19.82)
```
```
   }[](#l19.83)
```

   k = (int)_PyLong_DigitValue[Py_CHARMASK(*p)];[](#l19.84)
   assert(k >= 0 && k < base);[](#l19.85)
   accum |= (twodigits)k << bits_in_accum;[](#l19.86)
   bits_in_accum += bits_per_char;[](#l19.87)

@@ -2062,7 +2092,8 @@ long_from_binary_base(const char **str, } while (pdigit - z->ob_digit < n) *pdigit++ = 0;

return long_normalize(z);

*res = long_normalize(z);
return 0;

} /* Parses an int from a bytestring. Leading and trailing whitespace will be @@ -2087,23 +2118,29 @@ PyLong_FromString(const char *str, char "int() arg 2 must be >= 2 and <= 36"); return NULL; }

while (*str != '\0' && Py_ISSPACE(Py_CHARMASK(*str)))

while (*str != '\0' && Py_ISSPACE(Py_CHARMASK(*str))) { str++;

if (*str == '+')

}
if (*str == '+') { ++str;
} else if (*str == '-') { ++str; sign = -1; } if (base == 0) {

```
   if (str[0] != '0')[](#l19.115)
```

   if (str[0] != '0') {[](#l19.116)
       base = 10;[](#l19.117)

   else if (str[1] == 'x' || str[1] == 'X')[](#l19.118)

```
   }[](#l19.119)
```

   else if (str[1] == 'x' || str[1] == 'X') {[](#l19.120)
       base = 16;[](#l19.121)

   else if (str[1] == 'o' || str[1] == 'O')[](#l19.122)

```
   }[](#l19.123)
```

   else if (str[1] == 'o' || str[1] == 'O') {[](#l19.124)
       base = 8;[](#l19.125)

   else if (str[1] == 'b' || str[1] == 'B')[](#l19.126)

```
   }[](#l19.127)
```

   else if (str[1] == 'b' || str[1] == 'B') {[](#l19.128)
       base = 2;[](#l19.129)

   }[](#l19.130)
   else {[](#l19.131)
       /* "old" (C-style) octal literal, now invalid.[](#l19.132)
          it might still be zero though */[](#l19.133)

@@ -2114,12 +2151,26 @@ PyLong_FromString(const char *str, char if (str[0] == '0' && ((base == 16 && (str[1] == 'x' || str[1] == 'X')) || (base == 8 && (str[1] == 'o' || str[1] == 'O')) ||

    (base == 2  && (str[1] == 'b' || str[1] == 'B'))))[](#l19.138)

    (base == 2  && (str[1] == 'b' || str[1] == 'B')))) {[](#l19.139)
   str += 2;[](#l19.140)

   /* One underscore allowed here. */[](#l19.141)

```
   if (*str == '_') {[](#l19.142)
```
```
       ++str;[](#l19.143)
```
```
   }[](#l19.144)
```
}
if (str[0] == '_') {

   /* May not start with underscores. */[](#l19.147)

```
   goto onError;[](#l19.148)
```
}

start = str;

if ((base & (base - 1)) == 0)

   z = long_from_binary_base(&str, base);[](#l19.153)

if ((base & (base - 1)) == 0) {

   int res = long_from_binary_base(&str, base, &z);[](#l19.155)

```
   if (res < 0) {[](#l19.156)
```
```
       /* Syntax error. */[](#l19.157)
```
```
       goto onError;[](#l19.158)
```
```
   }[](#l19.159)
```
} else { /*** Binary bases can be converted in time linear in the number of digits, because @@ -2208,11 +2259,13 @@ digit beyond the first. ***/ twodigits c; /* current input character */ Py_ssize_t size_z;

   int digits = 0;[](#l19.168)
   int i;[](#l19.169)
   int convwidth;[](#l19.170)
   twodigits convmultmax, convmult;[](#l19.171)
   digit *pz, *pzstop;[](#l19.172)

```
   const char* scan;[](#l19.173)
```

   const char *scan, *lastdigit;[](#l19.174)

```
   char prev = 0;[](#l19.175)
```

static double log_base_BASE[37] = {0.0e0,}; static int convwidth_base[37] = {0,}; @@ -2226,8 +2279,9 @@ digit beyond the first. log((double)PyLong_BASE)); for (;;) { twodigits next = convmax * base;

           if (next > PyLong_BASE)[](#l19.183)

           if (next > PyLong_BASE) {[](#l19.184)
               break;[](#l19.185)

           }[](#l19.186)
           convmax = next;[](#l19.187)
           ++i;[](#l19.188)
       }[](#l19.189)

@@ -2238,21 +2292,43 @@ digit beyond the first. /* Find length of the string of numeric characters. */ scan = str;

   while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base)[](#l19.194)

```
   lastdigit = str;[](#l19.195)
```

   while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base || *scan == '_') {[](#l19.197)

```
       if (*scan == '_') {[](#l19.198)
```

           if (prev == '_') {[](#l19.199)

               /* Only one underscore allowed. */[](#l19.200)

               str = lastdigit + 1;[](#l19.201)

               goto onError;[](#l19.202)

```
           }[](#l19.203)
```
```
       }[](#l19.204)
```
```
       else {[](#l19.205)
```
```
           ++digits;[](#l19.206)
```

           lastdigit = scan;[](#l19.207)

```
       }[](#l19.208)
```

       prev = *scan;[](#l19.209)
       ++scan;[](#l19.210)

```
   }[](#l19.211)
```
```
   if (prev == '_') {[](#l19.212)
```

       /* Trailing underscore not allowed. */[](#l19.213)

       /* Set error pointer to first underscore. */[](#l19.214)

```
       str = lastdigit + 1;[](#l19.215)
```
```
       goto onError;[](#l19.216)
```
```
   }[](#l19.217)
```

/* Create an int object that can contain the largest possible * integer with this base and length. Note that there's no * need to initialize z->ob_digit -- no slot is read up before * being stored into. */

   size_z = (Py_ssize_t)((scan - str) * log_base_BASE[base]) + 1;[](#l19.224)

   size_z = (Py_ssize_t)(digits * log_base_BASE[base]) + 1;[](#l19.225)
   /* Uncomment next line to test exceedingly rare copy code */[](#l19.226)
   /* size_z = 1; */[](#l19.227)
   assert(size_z > 0);[](#l19.228)
   z = _PyLong_New(size_z);[](#l19.229)

```
   if (z == NULL)[](#l19.230)
```

   if (z == NULL) {[](#l19.231)
       return NULL;[](#l19.232)

   }[](#l19.233)
   Py_SIZE(z) = 0;[](#l19.234)

/* convwidth consecutive input digits are treated as a single @@ -2263,9 +2339,17 @@ digit beyond the first. /* Work ;-) */ while (str < scan) {

```
       if (*str == '_') {[](#l19.241)
```
```
           str++;[](#l19.242)
```
```
           continue;[](#l19.243)
```

       }[](#l19.244)
       /* grab up to convwidth digits from the input string */[](#l19.245)
       c = (digit)_PyLong_DigitValue[Py_CHARMASK(*str++)];[](#l19.246)

       for (i = 1; i < convwidth && str != scan; ++i, ++str) {[](#l19.247)

       for (i = 1; i < convwidth && str != scan; ++str) {[](#l19.248)

           if (*str == '_') {[](#l19.249)

```
               continue;[](#l19.250)
```
```
           }[](#l19.251)
```

           i++;[](#l19.252)
           c = (twodigits)(c *  base +[](#l19.253)
                           (int)_PyLong_DigitValue[Py_CHARMASK(*str)]);[](#l19.254)
           assert(c < PyLong_BASE);[](#l19.255)

@@ -2277,8 +2361,9 @@ digit beyond the first. */ if (i != convwidth) { convmult = base;

           for ( ; i > 1; --i)[](#l19.260)

           for ( ; i > 1; --i) {[](#l19.261)
               convmult *= base;[](#l19.262)

           }[](#l19.263)
       }[](#l19.264)

/* Multiply z by convmult, and add c. */ @@ -2316,41 +2401,51 @@ digit beyond the first. } } }

if (z == NULL)

if (z == NULL) { return NULL;
} if (error_if_nonzero) { /* reset the base to 0, else the exception message doesn't make too much sense */ base = 0;

```
   if (Py_SIZE(z) != 0)[](#l19.279)
```

   if (Py_SIZE(z) != 0) {[](#l19.280)
       goto onError;[](#l19.281)

   }[](#l19.282)
   /* there might still be other problems, therefore base[](#l19.283)
      remains zero here for the same reason */[](#l19.284)

}

if (str == start)

if (str == start) { goto onError;

if (sign < 0)

}
if (sign < 0) { Py_SIZE(z) = -(Py_SIZE(z));

while (*str && Py_ISSPACE(Py_CHARMASK(*str)))

}
while (*str && Py_ISSPACE(Py_CHARMASK(*str))) { str++;

if (*str != '\0')

}
if (*str != '\0') { goto onError;
} long_normalize(z); z = maybe_small_long(z);

if (z == NULL)

if (z == NULL) { return NULL;

if (pend != NULL)

}
if (pend != NULL) { *pend = (char *)str;
} return (PyObject *) z; onError:

if (pend != NULL)

if (pend != NULL) { *pend = (char *)str;
} Py_XDECREF(z); slen = strlen(orig_str) < 200 ? strlen(orig_str) : 200; strobj = PyUnicode_FromStringAndSize(orig_str, slen);

if (strobj == NULL)

if (strobj == NULL) { return NULL;
} PyErr_Format(PyExc_ValueError, "invalid literal for int() with base %d: %.200R", base, strobj);

--- a/Parser/tokenizer.c +++ b/Parser/tokenizer.c @@ -1333,6 +1333,28 @@ verify_identifier(struct tok_state *tok) } #endif +static int +tok_decimal_tail(struct tok_state *tok) +{

int c;

while (1) {
```
   do {[](#l20.13)
```
```
       c = tok_nextc(tok);[](#l20.14)
```
```
   } while (isdigit(c));[](#l20.15)
```
```
   if (c != '_') {[](#l20.16)
```
```
       break;[](#l20.17)
```
```
   }[](#l20.18)
```
```
   c = tok_nextc(tok);[](#l20.19)
```
```
   if (!isdigit(c)) {[](#l20.20)
```
```
       tok->done = E_TOKEN;[](#l20.21)
```
```
       tok_backup(tok, c);[](#l20.22)
```
```
       return 0;[](#l20.23)
```
```
   }[](#l20.24)
```
}
return c;

+} + /* Get next token, after space stripping etc. */ static int @@ -1353,17 +1375,20 @@ tok_get(struct tok_state *tok, char **p_ tok->atbol = 0; for (;;) { c = tok_nextc(tok);

```
       if (c == ' ')[](#l20.36)
```

       if (c == ' ') {[](#l20.37)
           col++, altcol++;[](#l20.38)

       }[](#l20.39)
       else if (c == '\t') {[](#l20.40)
           col = (col/tok->tabsize + 1) * tok->tabsize;[](#l20.41)
           altcol = (altcol/tok->alttabsize + 1)[](#l20.42)
               * tok->alttabsize;[](#l20.43)
       }[](#l20.44)

       else if (c == '\014') /* Control-L (formfeed) */[](#l20.45)

       else if (c == '\014')  {/* Control-L (formfeed) */[](#l20.46)
           col = altcol = 0; /* For Emacs users */[](#l20.47)

```
       else[](#l20.48)
```

```
       }[](#l20.49)
```

       else {[](#l20.50)
           break;[](#l20.51)

       }[](#l20.52)
   }[](#l20.53)
   tok_backup(tok, c);[](#l20.54)
   if (c == '#' || c == '\n') {[](#l20.55)

@@ -1372,10 +1397,12 @@ tok_get(struct tok_state *tok, char **p_ not passed to the parser as NEWLINE tokens, except totally empty lines in interactive mode, which signal the end of a command group. */

       if (col == 0 && c == '\n' && tok->prompt != NULL)[](#l20.60)

       if (col == 0 && c == '\n' && tok->prompt != NULL) {[](#l20.61)
           blankline = 0; /* Let it through */[](#l20.62)

```
       else[](#l20.63)
```

```
       }[](#l20.64)
```

       else {[](#l20.65)
           blankline = 1; /* Ignore completely */[](#l20.66)

       }[](#l20.67)
       /* We can't jump back right here since we still[](#l20.68)
          may need to skip to the end of a comment */[](#l20.69)
   }[](#l20.70)

@@ -1383,8 +1410,9 @@ tok_get(struct tok_state *tok, char *p_ if (col == tok->indstack[tok->indent]) { / No change */ if (altcol != tok->altindstack[tok->indent]) {

               if (indenterror(tok))[](#l20.75)

               if (indenterror(tok)) {[](#l20.76)
                   return ERRORTOKEN;[](#l20.77)

               }[](#l20.78)
           }[](#l20.79)
       }[](#l20.80)
       else if (col > tok->indstack[tok->indent]) {[](#l20.81)

@@ -1395,8 +1423,9 @@ tok_get(struct tok_state *tok, char **p_ return ERRORTOKEN; } if (altcol <= tok->altindstack[tok->indent]) {

               if (indenterror(tok))[](#l20.86)

               if (indenterror(tok)) {[](#l20.87)
                   return ERRORTOKEN;[](#l20.88)

               }[](#l20.89)
           }[](#l20.90)
           tok->pendin++;[](#l20.91)
           tok->indstack[++tok->indent] = col;[](#l20.92)

@@ -1415,8 +1444,9 @@ tok_get(struct tok_state *tok, char **p_ return ERRORTOKEN; } if (altcol != tok->altindstack[tok->indent]) {

               if (indenterror(tok))[](#l20.97)

               if (indenterror(tok)) {[](#l20.98)
                   return ERRORTOKEN;[](#l20.99)

               }[](#l20.100)
           }[](#l20.101)
       }[](#l20.102)
   }[](#l20.103)

@@ -1462,9 +1492,11 @@ tok_get(struct tok_state *tok, char *p_ tok->start = tok->cur - 1; / Skip comment */

if (c == '#')

   while (c != EOF && c != '\n')[](#l20.109)

if (c == '#') {

   while (c != EOF && c != '\n') {[](#l20.111)
       c = tok_nextc(tok);[](#l20.112)

```
   }[](#l20.113)
```
}

/* Check for EOF and errors now */ if (c == EOF) { @@ -1481,27 +1513,35 @@ tok_get(struct tok_state *tok, char *p_ saw_b = 1; / Since this is a backwards compatibility support literal we don't want to support it in arbitrary order like byte literals. */

       else if (!(saw_b || saw_u || saw_r || saw_f) && (c == 'u' || c == 'U'))[](#l20.122)

       else if (!(saw_b || saw_u || saw_r || saw_f)[](#l20.123)

                && (c == 'u'|| c == 'U')) {[](#l20.124)
           saw_u = 1;[](#l20.125)

       }[](#l20.126)
       /* ur"" and ru"" are not supported */[](#l20.127)

       else if (!(saw_r || saw_u) && (c == 'r' || c == 'R'))[](#l20.128)

       else if (!(saw_r || saw_u) && (c == 'r' || c == 'R')) {[](#l20.129)
           saw_r = 1;[](#l20.130)

       else if (!(saw_f || saw_b || saw_u) && (c == 'f' || c == 'F'))[](#l20.131)

```
       }[](#l20.132)
```

       else if (!(saw_f || saw_b || saw_u) && (c == 'f' || c == 'F')) {[](#l20.133)
           saw_f = 1;[](#l20.134)

```
       else[](#l20.135)
```

```
       }[](#l20.136)
```

       else {[](#l20.137)
           break;[](#l20.138)

       }[](#l20.139)
       c = tok_nextc(tok);[](#l20.140)

       if (c == '"' || c == '\'')[](#l20.141)

       if (c == '"' || c == '\'') {[](#l20.142)
           goto letter_quote;[](#l20.143)

       }[](#l20.144)
   }[](#l20.145)
   while (is_potential_identifier_char(c)) {[](#l20.146)

```
       if (c >= 128)[](#l20.147)
```

       if (c >= 128) {[](#l20.148)
           nonascii = 1;[](#l20.149)

       }[](#l20.150)
       c = tok_nextc(tok);[](#l20.151)
   }[](#l20.152)
   tok_backup(tok, c);[](#l20.153)

   if (nonascii && !verify_identifier(tok))[](#l20.154)

   if (nonascii && !verify_identifier(tok)) {[](#l20.155)
       return ERRORTOKEN;[](#l20.156)

   }[](#l20.157)
   *p_start = tok->start;[](#l20.158)
   *p_end = tok->cur;[](#l20.159)

@@ -1510,10 +1550,12 @@ tok_get(struct tok_state *tok, char *p_ / Current token length is 5. / if (tok->async_def) { / We're inside an 'async def' function. */

           if (memcmp(tok->start, "async", 5) == 0)[](#l20.165)

           if (memcmp(tok->start, "async", 5) == 0) {[](#l20.166)
               return ASYNC;[](#l20.167)

           if (memcmp(tok->start, "await", 5) == 0)[](#l20.168)

```
           }[](#l20.169)
```

           if (memcmp(tok->start, "await", 5) == 0) {[](#l20.170)
               return AWAIT;[](#l20.171)

           }[](#l20.172)
       }[](#l20.173)
       else if (memcmp(tok->start, "async", 5) == 0) {[](#l20.174)
           /* The current token is 'async'.[](#l20.175)

@@ -1546,8 +1588,9 @@ tok_get(struct tok_state *tok, char *p_ / Newline */ if (c == '\n') { tok->atbol = 1;

   if (blankline || tok->level > 0)[](#l20.180)

   if (blankline || tok->level > 0) {[](#l20.181)
       goto nextline;[](#l20.182)

   }[](#l20.183)
   *p_start = tok->start;[](#l20.184)
   *p_end = tok->cur - 1; /* Leave '\n' out of the string */[](#l20.185)
   tok->cont_line = 0;[](#l20.186)

@@ -1570,11 +1613,13 @@ tok_get(struct tok_state *tok, char **p_ *p_start = tok->start; *p_end = tok->cur; return ELLIPSIS;

```
       } else {[](#l20.191)
```

```
       }[](#l20.192)
```

       else {[](#l20.193)
           tok_backup(tok, c);[](#l20.194)
       }[](#l20.195)
       tok_backup(tok, '.');[](#l20.196)

```
   } else {[](#l20.197)
```

```
   }[](#l20.198)
```

   else {[](#l20.199)
       tok_backup(tok, c);[](#l20.200)
   }[](#l20.201)
   *p_start = tok->start;[](#l20.202)

@@ -1588,59 +1633,93 @@ tok_get(struct tok_state *tok, char *p_ / Hex, octal or binary -- maybe. / c = tok_nextc(tok); if (c == 'x' || c == 'X') { - / Hex */ c = tok_nextc(tok);

           if (!isxdigit(c)) {[](#l20.210)

               tok->done = E_TOKEN;[](#l20.211)

               tok_backup(tok, c);[](#l20.212)

               return ERRORTOKEN;[](#l20.213)

           }[](#l20.214)
           do {[](#l20.215)

               c = tok_nextc(tok);[](#l20.216)

           } while (isxdigit(c));[](#l20.217)

               if (c == '_') {[](#l20.218)

                   c = tok_nextc(tok);[](#l20.219)

```
               }[](#l20.220)
```

               if (!isxdigit(c)) {[](#l20.221)

                   tok->done = E_TOKEN;[](#l20.222)

                   tok_backup(tok, c);[](#l20.223)

                   return ERRORTOKEN;[](#l20.224)

```
               }[](#l20.225)
```
```
               do {[](#l20.226)
```

                   c = tok_nextc(tok);[](#l20.227)

               } while (isxdigit(c));[](#l20.228)

           } while (c == '_');[](#l20.229)
       }[](#l20.230)
       else if (c == 'o' || c == 'O') {[](#l20.231)
           /* Octal */[](#l20.232)
           c = tok_nextc(tok);[](#l20.233)

           if (c < '0' || c >= '8') {[](#l20.234)

               tok->done = E_TOKEN;[](#l20.235)

               tok_backup(tok, c);[](#l20.236)

               return ERRORTOKEN;[](#l20.237)

           }[](#l20.238)
           do {[](#l20.239)

               c = tok_nextc(tok);[](#l20.240)

           } while ('0' <= c && c < '8');[](#l20.241)

               if (c == '_') {[](#l20.242)

                   c = tok_nextc(tok);[](#l20.243)

```
               }[](#l20.244)
```

               if (c < '0' || c >= '8') {[](#l20.245)

                   tok->done = E_TOKEN;[](#l20.246)

                   tok_backup(tok, c);[](#l20.247)

                   return ERRORTOKEN;[](#l20.248)

```
               }[](#l20.249)
```
```
               do {[](#l20.250)
```

                   c = tok_nextc(tok);[](#l20.251)

               } while ('0' <= c && c < '8');[](#l20.252)

           } while (c == '_');[](#l20.253)
       }[](#l20.254)
       else if (c == 'b' || c == 'B') {[](#l20.255)
           /* Binary */[](#l20.256)
           c = tok_nextc(tok);[](#l20.257)

           if (c != '0' && c != '1') {[](#l20.258)

               tok->done = E_TOKEN;[](#l20.259)

               tok_backup(tok, c);[](#l20.260)

               return ERRORTOKEN;[](#l20.261)

           }[](#l20.262)
           do {[](#l20.263)

               c = tok_nextc(tok);[](#l20.264)

           } while (c == '0' || c == '1');[](#l20.265)

               if (c == '_') {[](#l20.266)

                   c = tok_nextc(tok);[](#l20.267)

```
               }[](#l20.268)
```

               if (c != '0' && c != '1') {[](#l20.269)

                   tok->done = E_TOKEN;[](#l20.270)

                   tok_backup(tok, c);[](#l20.271)

                   return ERRORTOKEN;[](#l20.272)

```
               }[](#l20.273)
```
```
               do {[](#l20.274)
```

                   c = tok_nextc(tok);[](#l20.275)

               } while (c == '0' || c == '1');[](#l20.276)

           } while (c == '_');[](#l20.277)
       }[](#l20.278)
       else {[](#l20.279)
           int nonzero = 0;[](#l20.280)
           /* maybe old-style octal; c is first char of it */[](#l20.281)
           /* in any case, allow '0' as a literal */[](#l20.282)

```
           while (c == '0')[](#l20.283)
```

               c = tok_nextc(tok);[](#l20.284)

           while (isdigit(c)) {[](#l20.285)

```
               nonzero = 1;[](#l20.286)
```

```
           while (1) {[](#l20.287)
```

               if (c == '_') {[](#l20.288)

                   c = tok_nextc(tok);[](#l20.289)

                   if (!isdigit(c)) {[](#l20.290)

                       tok->done = E_TOKEN;[](#l20.291)

                       tok_backup(tok, c);[](#l20.292)

                       return ERRORTOKEN;[](#l20.293)

```
                   }[](#l20.294)
```
```
               }[](#l20.295)
```

               if (c != '0') {[](#l20.296)

```
                   break;[](#l20.297)
```

               }[](#l20.298)
               c = tok_nextc(tok);[](#l20.299)
           }[](#l20.300)

```
           if (c == '.')[](#l20.301)
```

           if (isdigit(c)) {[](#l20.302)

```
               nonzero = 1;[](#l20.303)
```

               c = tok_decimal_tail(tok);[](#l20.304)

               if (c == 0) {[](#l20.305)

                   return ERRORTOKEN;[](#l20.306)

```
               }[](#l20.307)
```
```
           }[](#l20.308)
```
```
           if (c == '.') {[](#l20.309)
```

               c = tok_nextc(tok);[](#l20.310)
               goto fraction;[](#l20.311)

           else if (c == 'e' || c == 'E')[](#l20.312)

```
           }[](#l20.313)
```

           else if (c == 'e' || c == 'E') {[](#l20.314)
               goto exponent;[](#l20.315)

           else if (c == 'j' || c == 'J')[](#l20.316)

```
           }[](#l20.317)
```

           else if (c == 'j' || c == 'J') {[](#l20.318)
               goto imaginary;[](#l20.319)

           }[](#l20.320)
           else if (nonzero) {[](#l20.321)

               /* Old-style octal: now disallowed. */[](#l20.322)
               tok->done = E_TOKEN;[](#l20.323)
               tok_backup(tok, c);[](#l20.324)
               return ERRORTOKEN;[](#l20.325)

@@ -1649,17 +1728,22 @@ tok_get(struct tok_state *tok, char *p_ } else { / Decimal */

```
       do {[](#l20.330)
```

           c = tok_nextc(tok);[](#l20.331)

       } while (isdigit(c));[](#l20.332)

       c = tok_decimal_tail(tok);[](#l20.333)

```
       if (c == 0) {[](#l20.334)
```

           return ERRORTOKEN;[](#l20.335)

       }[](#l20.336)
       {[](#l20.337)
           /* Accept floating point numbers. */[](#l20.338)
           if (c == '.') {[](#l20.339)

               c = tok_nextc(tok);[](#l20.340)
   fraction:[](#l20.341)
               /* Fraction */[](#l20.342)

```
               do {[](#l20.343)
```

                   c = tok_nextc(tok);[](#l20.344)

               } while (isdigit(c));[](#l20.345)

               if (isdigit(c)) {[](#l20.346)

                   c = tok_decimal_tail(tok);[](#l20.347)

                   if (c == 0) {[](#l20.348)

                       return ERRORTOKEN;[](#l20.349)

```
                   }[](#l20.350)
```

               }[](#l20.351)
           }[](#l20.352)
           if (c == 'e' || c == 'E') {[](#l20.353)
               int e;[](#l20.354)

@@ -1681,14 +1765,16 @@ tok_get(struct tok_state *tok, char **p_ *p_end = tok->cur; return NUMBER; }

```
               do {[](#l20.359)
```

                   c = tok_nextc(tok);[](#l20.360)

               } while (isdigit(c));[](#l20.361)

               c = tok_decimal_tail(tok);[](#l20.362)

               if (c == 0) {[](#l20.363)

                   return ERRORTOKEN;[](#l20.364)

               }[](#l20.365)
           }[](#l20.366)

           if (c == 'j' || c == 'J')[](#l20.367)

           if (c == 'j' || c == 'J') {[](#l20.368)
               /* Imaginary part */[](#l20.369)
   imaginary:[](#l20.370)
               c = tok_nextc(tok);[](#l20.371)

           }[](#l20.372)
       }[](#l20.373)
   }[](#l20.374)
   tok_backup(tok, c);[](#l20.375)

@@ -1708,22 +1794,27 @@ tok_get(struct tok_state *tok, char **p_ c = tok_nextc(tok); if (c == quote) { c = tok_nextc(tok);

```
       if (c == quote)[](#l20.380)
```

       if (c == quote) {[](#l20.381)
           quote_size = 3;[](#l20.382)

```
       else[](#l20.383)
```

```
       }[](#l20.384)
```

       else {[](#l20.385)
           end_quote_size = 1;     /* empty string found */[](#l20.386)

```
       }[](#l20.387)
   }[](#l20.388)
```

```
   if (c != quote)[](#l20.389)
```

   if (c != quote) {[](#l20.390)
       tok_backup(tok, c);[](#l20.391)

```
   }[](#l20.392)
```

/* Get rest of string */ while (end_quote_size != quote_size) { c = tok_nextc(tok); if (c == EOF) {

           if (quote_size == 3)[](#l20.398)

           if (quote_size == 3) {[](#l20.399)
               tok->done = E_EOFS;[](#l20.400)

```
           else[](#l20.401)
```

```
           }[](#l20.402)
```

           else {[](#l20.403)
               tok->done = E_EOLS;[](#l20.404)

           }[](#l20.405)
           tok->cur = tok->inp;[](#l20.406)
           return ERRORTOKEN;[](#l20.407)
       }[](#l20.408)

@@ -1732,12 +1823,14 @@ tok_get(struct tok_state *tok, char **p_ tok->cur = tok->inp; return ERRORTOKEN; }

```
       if (c == quote)[](#l20.413)
```

       if (c == quote) {[](#l20.414)
           end_quote_size += 1;[](#l20.415)

       }[](#l20.416)
       else {[](#l20.417)
           end_quote_size = 0;[](#l20.418)

```
           if (c == '\\')[](#l20.419)
```

           if (c == '\\') {[](#l20.420)
               tok_nextc(tok);  /* skip escaped char */[](#l20.421)

           }[](#l20.422)
       }[](#l20.423)
   }[](#l20.424)

@@ -1767,7 +1860,8 @@ tok_get(struct tok_state *tok, char **p_ int token3 = PyToken_ThreeChars(c, c2, c3); if (token3 != OP) { token = token3;

```
       } else {[](#l20.430)
```

```
       }[](#l20.431)
```

       else {[](#l20.432)
           tok_backup(tok, c3);[](#l20.433)
       }[](#l20.434)
       *p_start = tok->start;[](#l20.435)

--- a/Python/ast.c +++ b/Python/ast.c @@ -4018,7 +4018,7 @@ ast_for_stmt(struct compiling *c, const } static PyObject * -parsenumber(struct compiling *c, const char *s) +parsenumber_raw(struct compiling *c, const char *s) { const char *end; long x; @@ -4061,6 +4061,31 @@ parsenumber(struct compiling *c, const c } static PyObject * +parsenumber(struct compiling *c, const char *s) +{

char *dup, *end;
PyObject *res = NULL;

assert(s != NULL);

if (strchr(s, '_') == NULL) {

   return parsenumber_raw(c, s);[](#l21.24)

}
/* Create a duplicate without underscores. */
dup = PyMem_Malloc(strlen(s) + 1);
end = dup;
for (; *s; s++) {
```
   if (*s != '_') {[](#l21.30)
```
```
       *end++ = *s;[](#l21.31)
```
```
   }[](#l21.32)
```
}
*end = '\0';
res = parsenumber_raw(c, dup);
PyMem_Free(dup);
return res;

+} + +static PyObject * decode_utf8(struct compiling *c, const char **sPtr, const char *end) { const char *s, *t;

--- a/Python/pystrtod.c +++ b/Python/pystrtod.c @@ -370,6 +370,72 @@ PyOS_string_to_double(const char s, return result; } +/ Remove underscores that follow the underscore placement rule from

the string and then call the innerfunc function on the result.
It should return a new object or NULL on exception. +
what is used for the error message emitted when underscores are detected
that don't follow the rule. arg is an opaque pointer passed to the inner
function. +
This is used to implement underscore-agnostic conversion for floats
and complex numbers. +*/ +PyObject * +_Py_string_to_number_with_underscores(
const char *s, Py_ssize_t orig_len, const char *what, PyObject *obj, void *arg,
PyObject *(*innerfunc)(const char *, Py_ssize_t, void *))

char prev;
const char *p, *last;
char *dup, *end;
PyObject *result;

if (strchr(s, '_') == NULL) {

   return innerfunc(s, orig_len, arg);[](#l22.29)

}

dup = PyMem_Malloc(orig_len + 1);
end = dup;
prev = '\0';
last = s + orig_len;
for (p = s; *p; p++) {
```
   if (*p == '_') {[](#l22.37)
```

       /* Underscores are only allowed after digits. */[](#l22.38)

       if (!(prev >= '0' && prev <= '9')) {[](#l22.39)

```
           goto error;[](#l22.40)
```
```
       }[](#l22.41)
```
```
   }[](#l22.42)
```
```
   else {[](#l22.43)
```
```
       *end++ = *p;[](#l22.44)
```

       /* Underscores are only allowed before digits. */[](#l22.45)

       if (prev == '_' && !(*p >= '0' && *p <= '9')) {[](#l22.46)

```
           goto error;[](#l22.47)
```
```
       }[](#l22.48)
```
```
   }[](#l22.49)
```
```
   prev = *p;[](#l22.50)
```
}
/* Underscores are not allowed at the end. */
if (prev == '_') {
```
   goto error;[](#l22.54)
```
}
/* No embedded NULs allowed. */
if (p != last) {
```
   goto error;[](#l22.58)
```
}
*end = '\0';
result = innerfunc(dup, end - dup, arg);
PyMem_Free(dup);
return result;

error:
PyMem_Free(dup);
PyErr_Format(PyExc_ValueError,

   "could not convert string to %s: "[](#l22.68)

```
   "%R", what, obj);[](#l22.69)
```
return NULL;

+} + #ifdef PY_NO_SHORT_FLOAT_REPR /* Given a string that may have a decimal point in the current

cpython: 8a881dafe335 (original) (raw)

Trailing underscores:

Underscores in the base selector:

Old-style octal, still disallowed:

Multiple consecutive underscores:

Underscore right before a dot:

Underscore right after a dot:

Underscore right after a sign:

Underscore right before j:

Underscore right before e:

Underscore right after e:

Complex cases with parens:

Return the empty string, plus all of the valid string prefixes.