cpython: 8a881dafe335 (original) (raw)
--- a/Doc/library/decimal.rst
+++ b/Doc/library/decimal.rst
@@ -345,7 +345,7 @@ Decimal objects
value can be an integer, string, tuple, :class:float
, or another :class:Decimal
object. If no value is given, returns Decimal('0')
. If value is a
string, it should conform to the decimal numeric string syntax after leading
- and trailing whitespace characters, as well as underscores throughout, are removed::
sign ::= '+' | '-'
digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
@@ -394,6 +394,10 @@ Decimal objects
:class:
float
arguments raise an exception if the :exc:FloatOperation
trap is set. By default the trap is off. - .. versionchanged:: 3.6
Underscores are allowed for grouping, as with integral and floating-point[](#l1.17)
literals in code.[](#l1.18)
+
Decimal floating point objects share many properties with the other built-in
numeric types such as :class:float
and :class:int
. All of the usual math
operations and special methods apply. Likewise, decimal objects can be
@@ -1075,8 +1079,8 @@ In addition to the three supplied contex
Decimal('4.44')
This method implements the to-number operation of the IBM specification.
If the argument is a string, no leading or trailing whitespace is[](#l1.27)
permitted.[](#l1.28)
If the argument is a string, no leading or trailing whitespace or[](#l1.29)
underscores are permitted.[](#l1.30)
.. method:: create_decimal_from_float(f)
--- a/Doc/library/functions.rst
+++ b/Doc/library/functions.rst
@@ -271,6 +271,9 @@ are always available. They are listed h
The complex type is described in :ref:typesnumeric
.
+
.. function:: delattr(object, name)
@@ -531,10 +534,13 @@ are always available. They are listed h
The float type is described in :ref:typesnumeric
.
.. function:: format(value[, format_spec])
@@ -702,6 +708,10 @@ are always available. They are listed h
:meth:base.__int__ <object.__int__>
instead of :meth:base.__index__[](#l2.31) <object.__index__>
.
+ + .. function:: isinstance(object, classinfo) Return true if the object argument is an instance of the classinfo
--- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -721,20 +721,24 @@ Integer literals Integer literals are described by the following lexical definitions: .. productionlist::
- integer:
decimalinteger
|octinteger
|hexinteger
|bininteger
- decimalinteger:
nonzerodigit
digit
* | "0"+
- integer:
decinteger
|bininteger
|octinteger
|hexinteger
- decinteger:
nonzerodigit
([""]digit
)* | "0"+ ([""] "0")* - bininteger: "0" ("b" | "B") (["_"]
bindigit
)+ - octinteger: "0" ("o" | "O") (["_"]
octdigit
)+ - hexinteger: "0" ("x" | "X") (["_"]
hexdigit
)+ nonzerodigit: "1"..."9" digit: "0"..."9"
- octinteger: "0" ("o" | "O")
octdigit
+ - hexinteger: "0" ("x" | "X")
hexdigit
+ - bininteger: "0" ("b" | "B")
bindigit
+
There is no limit for the length of integer literals apart from what can be
stored in available memory.
+Underscores are ignored for determining the numeric value of the literal. They
+can be used to group digits for enhanced readability. One underscore can occur
+between digits, and after base specifiers like 0x
.
+
Note that leading zeros in a non-zero decimal number are not allowed. This is
for disambiguation with C-style octal literals, which Python used before version
3.0.
@@ -743,6 +747,10 @@ Some examples of integer literals::
7 2147483647 0o177 0b100110111
3 79228162514264337593543950336 0o377 0xdeadbeef
100_000_000_000 0b_1110_0101[](#l3.38)
.. _floating:
@@ -754,23 +762,28 @@ Floating point literals are described by
.. productionlist::
floatnumber: pointfloat
| exponentfloat
- pointfloat: [
intpart
]fraction
|intpart
"." - exponentfloat: (
intpart
|pointfloat
)exponent
- intpart:
digit
+ - fraction: "."
digit
+ - exponent: ("e" | "E") ["+" | "-"]
digit
+
- pointfloat: [
digitpart
]fraction
|digitpart
"." - exponentfloat: (
digitpart
|pointfloat
)exponent
- digitpart:
digit
(["_"]digit
)* - fraction: "."
digitpart
- exponent: ("e" | "E") ["+" | "-"]
digitpart
Note that the integer and exponent parts are always interpreted using radix 10.
For example, 077e010
is legal, and denotes the same number as 77e10
. The
-allowed range of floating point literals is implementation-dependent. Some
-examples of floating point literals::
+allowed range of floating point literals is implementation-dependent. As in
+integer literals, underscores are supported for digit grouping.
Note that numeric literals do not include a sign; a phrase like -1
is
actually an expression composed of the unary operator -
and the literal
1
.
+.. versionchanged:: 3.6
.. _imaginary: @@ -780,7 +793,7 @@ Imaginary literals Imaginary literals are described by the following lexical definitions: .. productionlist::
An imaginary literal yields a complex number with a real part of 0.0. Complex
numbers are represented as a pair of floating point numbers and have the same
@@ -788,7 +801,7 @@ restrictions on their range. To create
part, add a floating point number to it, e.g., (3+4j)
. Some examples of
imaginary literals::
--- a/Doc/whatsnew/3.6.rst
+++ b/Doc/whatsnew/3.6.rst
@@ -124,6 +124,29 @@ Windows improvements:
New Features
============
+.. _pep-515:
+
+PEP 515: Underscores in Numeric Literals
+========================================
+
+Prior to PEP 515, there was no support for writing long numeric
+literals with some form of separator to improve readability. For
+instance, how big is 1000000000000000```? With :pep:`515`, though,[](#l4.14) +you can use underscores to separate digits as desired to make numeric[](#l4.15) +literals easier to read:
1_000_000_000_000_000. Underscores can be[](#l4.16) +used with other numeric literals beyond integers, e.g.[](#l4.17) +
0x_FF_FF_FF_FF``.
+
+Single underscores are allowed between digits and after any base
+specifier. More than a single underscore in a row, leading, or
+trailing underscores are not allowed.
+
+.. seealso::
+
+ .. _pep-523: PEP 523: Adding a frame evaluation API to CPython
--- a/Include/pystrtod.h +++ b/Include/pystrtod.h @@ -19,6 +19,10 @@ PyAPI_FUNC(char *) PyOS_double_to_string int *type); #ifndef Py_LIMITED_API +PyAPI_FUNC(PyObject *) _Py_string_to_number_with_underscores(
- const char *str, Py_ssize_t len, const char *what, PyObject *obj, void *arg,
- PyObject *(*innerfunc)(const char *, Py_ssize_t, void *));
+ PyAPI_FUNC(double) _Py_parse_inf_or_nan(const char *p, char **endptr); #endif
--- a/Lib/_pydecimal.py +++ b/Lib/_pydecimal.py @@ -589,7 +589,7 @@ class Decimal(object): # From a string # REs insist on real strings, so we can too. if isinstance(value, str):
m = _parser(value.strip())[](#l6.7)
m = _parser(value.strip().replace("_", ""))[](#l6.8) if m is None:[](#l6.9) if context is None:[](#l6.10) context = getcontext()[](#l6.11)
@@ -4125,7 +4125,7 @@ class Context(object): This will make it round up for that operation. """ rounding = self.rounding
self.rounding= type[](#l6.16)
self.rounding = type[](#l6.17) return rounding[](#l6.18)
def create_decimal(self, num='0'): @@ -4134,10 +4134,10 @@ class Context(object): This method implements the to-number operation of the IBM Decimal specification."""
if isinstance(num, str) and num != num.strip():[](#l6.25)
if isinstance(num, str) and (num != num.strip() or '_' in num):[](#l6.26) return self._raise_error(ConversionSyntax,[](#l6.27)
"no trailing or leading whitespace is "[](#l6.28)
"permitted.")[](#l6.29)
"trailing or leading whitespace and "[](#l6.30)
"underscores are not permitted.")[](#l6.31)
d = Decimal(num, context=self) if d._isnan() and len(d._int) > self.prec - self.clamp:
--- a/Lib/test/test_complex.py +++ b/Lib/test/test_complex.py @@ -1,5 +1,7 @@ import unittest from test import support +from test.test_grammar import (VALID_UNDERSCORE_LITERALS,
INVALID_UNDERSCORE_LITERALS)[](#l7.7)
from random import random from math import atan2, isnan, copysign @@ -377,6 +379,18 @@ class ComplexTest(unittest.TestCase): self.assertAlmostEqual(complex(complex1(1j)), 2j) self.assertRaises(TypeError, complex, complex2(1j))
- def test_underscores(self):
# check underscores[](#l7.16)
for lit in VALID_UNDERSCORE_LITERALS:[](#l7.17)
if not any(ch in lit for ch in 'xXoObB'):[](#l7.18)
self.assertEqual(complex(lit), eval(lit))[](#l7.19)
self.assertEqual(complex(lit), complex(lit.replace('_', '')))[](#l7.20)
for lit in INVALID_UNDERSCORE_LITERALS:[](#l7.21)
if lit in ('0_7', '09_99'): # octals are not recognized here[](#l7.22)
continue[](#l7.23)
if not any(ch in lit for ch in 'xXoObB'):[](#l7.24)
self.assertRaises(ValueError, complex, lit)[](#l7.25)
+ def test_hash(self): for x in range(-30, 30): self.assertEqual(hash(x), hash(complex(x, 0)))
--- a/Lib/test/test_decimal.py +++ b/Lib/test/test_decimal.py @@ -554,6 +554,10 @@ class ExplicitConstructionTest(unittest. self.assertEqual(str(Decimal(' -7.89')), '-7.89') self.assertEqual(str(Decimal(" 3.45679 ")), '3.45679')
# underscores[](#l8.7)
self.assertEqual(str(Decimal('1_3.3e4_0')), '1.33E+41')[](#l8.8)
self.assertEqual(str(Decimal('1_0_0_0')), '1000')[](#l8.9)
+ # unicode whitespace for lead in ["", ' ', '\u00a0', '\u205f']: for trail in ["", ' ', '\u00a0', '\u205f']: @@ -578,6 +582,9 @@ class ExplicitConstructionTest(unittest. # embedded NUL self.assertRaises(InvalidOperation, Decimal, "12\u00003")
# underscores don't prevent errors[](#l8.18)
self.assertRaises(InvalidOperation, Decimal, "1_2_\u00003")[](#l8.19)
+ @cpython_only def test_from_legacy_strings(self): import _testcapi @@ -772,6 +779,9 @@ class ExplicitConstructionTest(unittest. self.assertRaises(InvalidOperation, nc.create_decimal, "xyz") self.assertRaises(ValueError, nc.create_decimal, (1, "xyz", -25)) self.assertRaises(TypeError, nc.create_decimal, "1234", "5678")
# no whitespace and underscore stripping is done with this method[](#l8.28)
self.assertRaises(InvalidOperation, nc.create_decimal, " 1234")[](#l8.29)
self.assertRaises(InvalidOperation, nc.create_decimal, "12_34")[](#l8.30)
# too many NaN payload digits nc.prec = 3
--- a/Lib/test/test_float.py +++ b/Lib/test/test_float.py @@ -1,4 +1,3 @@ - import fractions import operator import os @@ -9,6 +8,8 @@ import time import unittest from test import support +from test.test_grammar import (VALID_UNDERSCORE_LITERALS,
INVALID_UNDERSCORE_LITERALS)[](#l9.13)
from math import isinf, isnan, copysign, ldexp INF = float("inf") @@ -60,6 +61,27 @@ class GeneralFloatCases(unittest.TestCas float(b'.' + b'1'*1000) float('.' + '1'*1000)
- def test_underscores(self):
for lit in VALID_UNDERSCORE_LITERALS:[](#l9.22)
if not any(ch in lit for ch in 'jJxXoObB'):[](#l9.23)
self.assertEqual(float(lit), eval(lit))[](#l9.24)
self.assertEqual(float(lit), float(lit.replace('_', '')))[](#l9.25)
for lit in INVALID_UNDERSCORE_LITERALS:[](#l9.26)
if lit in ('0_7', '09_99'): # octals are not recognized here[](#l9.27)
continue[](#l9.28)
if not any(ch in lit for ch in 'jJxXoObB'):[](#l9.29)
self.assertRaises(ValueError, float, lit)[](#l9.30)
# Additional test cases; nan and inf are never valid as literals,[](#l9.31)
# only in the float() constructor, but we don't allow underscores[](#l9.32)
# in or around them.[](#l9.33)
self.assertRaises(ValueError, float, '_NaN')[](#l9.34)
self.assertRaises(ValueError, float, 'Na_N')[](#l9.35)
self.assertRaises(ValueError, float, 'IN_F')[](#l9.36)
self.assertRaises(ValueError, float, '-_INF')[](#l9.37)
self.assertRaises(ValueError, float, '-INF_')[](#l9.38)
# Check that we handle bytes values correctly.[](#l9.39)
self.assertRaises(ValueError, float, b'0_.\xff9')[](#l9.40)
+ def test_non_numeric_input_types(self): # Test possible non-numeric types for the argument x, including # subclasses of the explicitly documented accepted types.
--- a/Lib/test/test_grammar.py +++ b/Lib/test/test_grammar.py @@ -16,6 +16,87 @@ from collections import ChainMap from test import ann_module2 import test +# These are shared with test_tokenize and other test modules. +# +# Note: since several test cases filter out floats by looking for "e" and ".", +# don't add hexadecimal literals that contain "e" or "E". +VALID_UNDERSCORE_LITERALS = [
- '0_0_0',
- '4_2',
- '1_0000_0000',
- '0b1001_0100',
- '0xffff_ffff',
- '0o5_7_7',
- '1_00_00.5',
- '1_00_00.5e5',
- '1_00_00e5_1',
- '1e1_0',
- '.1_4',
- '.1_4e1',
- '0b_0',
- '0x_f',
- '0o_5',
- '1_00_00j',
- '1_00_00.5j',
- '1_00_00e5_1j',
- '.1_4j',
- '(1_2.5+3_3j)',
- '(.5_6j)',
+] +INVALID_UNDERSCORE_LITERALS = [
Trailing underscores:
- '0_',
- '42_',
- '1.4j_',
- '0x_',
- '0b1_',
- '0xf_',
- '0o5_',
- '0 if 1_Else 1',
Underscores in the base selector:
- '0_b0',
- '0_xf',
- '0_o5',
Old-style octal, still disallowed:
- '0_7',
- '09_99',
Multiple consecutive underscores:
- '4_______2',
- '0.1__4',
- '0.1__4j',
- '0b1001__0100',
- '0xffff__ffff',
- '0x___',
- '0o5__77',
- '1e1__0',
- '1e1__0j',
Underscore right before a dot:
- '1_.4',
- '1_.4j',
Underscore right after a dot:
- '1._4',
- '1._4j',
- '._5',
- '._5j',
Underscore right after a sign:
- '1.0e+_1',
- '1.0e+_1j',
Underscore right before j:
- '1.4_j',
- '1.4e5_j',
Underscore right before e:
- '1_e1',
- '1.4_e1',
- '1.4_e1j',
Underscore right after e:
- '1e_1',
- '1.4e_1',
- '1.4e_1j',
Complex cases with parens:
- '(1+1.5_j_)',
- '(1+1.5_j)',
+] + class TokenTests(unittest.TestCase): @@ -95,6 +176,14 @@ class TokenTests(unittest.TestCase): self.assertEqual(1 if 0else 0, 0) self.assertRaises(SyntaxError, eval, "0 if 1Else 0")
- def test_underscore_literals(self):
for lit in VALID_UNDERSCORE_LITERALS:[](#l10.96)
self.assertEqual(eval(lit), eval(lit.replace('_', '')))[](#l10.97)
for lit in INVALID_UNDERSCORE_LITERALS:[](#l10.98)
self.assertRaises(SyntaxError, eval, lit)[](#l10.99)
# Sanity check: no literal begins with an underscore[](#l10.100)
self.assertRaises(NameError, eval, "_0")[](#l10.101)
+ def test_string_literals(self): x = ''; y = ""; self.assertTrue(len(x) == 0 and x == y) x = '''; y = "'"; self.assertTrue(len(x) == 1 and x == y and ord(x) == 39)
--- a/Lib/test/test_int.py +++ b/Lib/test/test_int.py @@ -2,6 +2,8 @@ import sys import unittest from test import support +from test.test_grammar import (VALID_UNDERSCORE_LITERALS,
INVALID_UNDERSCORE_LITERALS)[](#l11.8)
L = [ ('0', 0), @@ -212,6 +214,25 @@ class IntTestCases(unittest.TestCase): self.assertEqual(int('2br45qc', 35), 4294967297) self.assertEqual(int('1z141z5', 36), 4294967297)
- def test_underscores(self):
for lit in VALID_UNDERSCORE_LITERALS:[](#l11.17)
if any(ch in lit for ch in '.eEjJ'):[](#l11.18)
continue[](#l11.19)
self.assertEqual(int(lit, 0), eval(lit))[](#l11.20)
self.assertEqual(int(lit, 0), int(lit.replace('_', ''), 0))[](#l11.21)
for lit in INVALID_UNDERSCORE_LITERALS:[](#l11.22)
if any(ch in lit for ch in '.eEjJ'):[](#l11.23)
continue[](#l11.24)
self.assertRaises(ValueError, int, lit, 0)[](#l11.25)
# Additional test cases with bases != 0, only for the constructor:[](#l11.26)
self.assertEqual(int("1_00", 3), 9)[](#l11.27)
self.assertEqual(int("0_100"), 100) # not valid as a literal
self.assertEqual(int(b"1_00"), 100) # byte underscore[](#l11.29)
self.assertRaises(ValueError, int, "_100")[](#l11.30)
self.assertRaises(ValueError, int, "+_100")[](#l11.31)
self.assertRaises(ValueError, int, "1__00")[](#l11.32)
self.assertRaises(ValueError, int, "100_")[](#l11.33)
+ @support.cpython_only def test_small_ints(self): # Bug #3236: Return small longs from PyLong_FromString
--- a/Lib/test/test_tokenize.py +++ b/Lib/test/test_tokenize.py @@ -3,7 +3,9 @@ from tokenize import (tokenize, _tokeniz STRING, ENDMARKER, ENCODING, tok_name, detect_encoding, open as tokenize_open, Untokenizer) from io import BytesIO -from unittest import TestCase, mock, main +from unittest import TestCase, mock +from test.test_grammar import (VALID_UNDERSCORE_LITERALS,
INVALID_UNDERSCORE_LITERALS)[](#l12.10)
import os import token @@ -185,6 +187,21 @@ def k(x): NUMBER '3.14e159' (1, 4) (1, 12) """)
- def test_underscore_literals(self):
def number_token(s):[](#l12.19)
f = BytesIO(s.encode('utf-8'))[](#l12.20)
for toktype, token, start, end, line in tokenize(f.readline):[](#l12.21)
if toktype == NUMBER:[](#l12.22)
return token[](#l12.23)
return 'invalid token'[](#l12.24)
for lit in VALID_UNDERSCORE_LITERALS:[](#l12.25)
if '(' in lit:[](#l12.26)
# this won't work with compound complex inputs[](#l12.27)
continue[](#l12.28)
self.assertEqual(number_token(lit), lit)[](#l12.29)
for lit in INVALID_UNDERSCORE_LITERALS:[](#l12.30)
self.assertNotEqual(number_token(lit), lit)[](#l12.31)
+ def test_string(self): # String literals self.check_tokenize("x = ''; y = """, """[](#l12.35) @@ -1529,11 +1546,10 @@ class TestRoundtrip(TestCase): tempdir = os.path.dirname(fn) or os.curdir testfiles = glob.glob(os.path.join(tempdir, "test*.py"))
# Tokenize is broken on test_unicode_identifiers.py because regular[](#l12.40)
# expressions are broken on the obscure unicode identifiers in it.[](#l12.41)
# *sigh* With roundtrip extended to test the 5-tuple mode of[](#l12.42)
# untokenize, 7 more testfiles fail. Remove them also until the[](#l12.43)
# failure is diagnosed.[](#l12.44)
# Tokenize is broken on test_pep3131.py because regular expressions are[](#l12.45)
# broken on the obscure unicode identifiers in it. *sigh*[](#l12.46)
# With roundtrip extended to test the 5-tuple mode of untokenize,[](#l12.47)
# 7 more testfiles fail. Remove them also until the failure is diagnosed.[](#l12.48)
testfiles.remove(os.path.join(tempdir, "test_unicode_identifiers.py")) for f in ('buffer', 'builtin', 'fileio', 'inspect', 'os', 'platform', 'sys'): @@ -1565,4 +1581,4 @@ class TestRoundtrip(TestCase): if name == "main":
--- a/Lib/test/test_types.py +++ b/Lib/test/test_types.py @@ -48,6 +48,7 @@ class TypesTests(unittest.TestCase): def test_float_constructor(self): self.assertRaises(ValueError, float, '') self.assertRaises(ValueError, float, '5\0')
self.assertRaises(ValueError, float, '5_5\0')[](#l13.7)
def test_zero_division(self): try: 5.0 / 0.0
--- a/Lib/tokenize.py +++ b/Lib/tokenize.py @@ -120,16 +120,17 @@ Comment = r'#[^\r\n]' Ignore = Whitespace + any(r'\\r?\n' + Whitespace) + maybe(Comment) Name = r'\w+' -Hexnumber = r'0[xX][0-9a-fA-F]+' -Binnumber = r'0[bB][01]+' -Octnumber = r'0[oO][0-7]+' -Decnumber = r'(?:0+|[1-9][0-9])' +Hexnumber = r'0xX+' +Binnumber = r'0bB+' +Octnumber = r'0oO+' +Decnumber = r'(?:0(?:_?0)|1-9)' Intnumber = group(Hexnumber, Binnumber, Octnumber, Decnumber) -Exponent = r'[eE][-+]?[0-9]+' -Pointfloat = group(r'[0-9]+.[0-9]', r'.[0-9]+') + maybe(Exponent) -Expfloat = r'[0-9]+' + Exponent +Exponent = r'[eE][-+]?0-9' +Pointfloat = group(r'0-9.(?:0-9)?',
r'\.[0-9](?:_?[0-9])*') + maybe(Exponent)[](#l14.21)
+Expfloat = r'0-9' + Exponent Floatnumber = group(Pointfloat, Expfloat) -Imagnumber = group(r'[0-9]+[jJ]', Floatnumber + r'[jJ]') +Imagnumber = group(r'0-9[jJ]', Floatnumber + r'[jJ]') Number = group(Imagnumber, Floatnumber, Intnumber)
Return the empty string, plus all of the valid string prefixes.
--- a/Misc/NEWS +++ b/Misc/NEWS @@ -17,6 +17,8 @@ Core and Builtins efficient bytecode. Patch by Demur Rumed, design by Serhiy Storchaka, reviewed by Serhiy Storchaka and Victor Stinner. +- Issue #26331: Implement tokenizing support for PEP 515. Patch by Georg Brandl. +
- Issue #27999: Make "global after use" a SyntaxError, and ditto for nonlocal. Patch by Ivan Levkivskyi. @@ -2678,7 +2680,7 @@ Library
- Issue #24774: Fix docstring in http.server.test. Patch from Chiu-Hsiang Hsu.
- Issue #21159: Improve message in configparser.InterpolationMissingOptionError.
- Patch from �?ukasz Langa.
- Issue #20362: Honour TestCase.longMessage correctly in assertRegex. Patch from Ilia Kurenkov. @@ -4606,7 +4608,7 @@ Library Based on patch by Martin Panter.
- Issue #17293: uuid.getnode() now determines MAC address on AIX using netstat.
- Based on patch by Aivars Kalv�?ns.
--- a/Modules/_decimal/_decimal.c +++ b/Modules/_decimal/_decimal.c @@ -1889,12 +1889,13 @@ is_space(enum PyUnicode_Kind kind, void /* Return the ASCII representation of a numeric Unicode string. The numeric string may contain ascii characters in the range [1, 127], any Unicode space and any unicode digit. If strip_ws is true, leading and trailing
- whitespace is stripped. If ignore_underscores is true, underscores are
- ignored. Return NULL if malloc fails and an empty string if invalid characters are found. */ static char * -numeric_as_ascii(const PyObject *u, int strip_ws) +numeric_as_ascii(const PyObject *u, int strip_ws, int ignore_underscores) { enum PyUnicode_Kind kind; void *data; @@ -1929,6 +1930,9 @@ numeric_as_ascii(const PyObject *u, int for (; j < len; j++) { ch = PyUnicode_READ(kind, data, j);
if (ignore_underscores && ch == '_') {[](#l16.23)
continue;[](#l16.24)
}[](#l16.25) if (0 < ch && ch <= 127) {[](#l16.26) *cp++ = ch;[](#l16.27) continue;[](#l16.28)
@@ -2011,7 +2015,7 @@ PyDecType_FromUnicode(PyTypeObject *type PyObject *dec; char *s;
- s = numeric_as_ascii(u, 0, 0); if (s == NULL) { return NULL; } @@ -2031,7 +2035,7 @@ PyDecType_FromUnicodeExactWS(PyTypeObjec PyObject *dec; char *s;
--- a/Objects/complexobject.c +++ b/Objects/complexobject.c @@ -759,29 +759,12 @@ static PyMemberDef complex_members[] = { }; static PyObject * -complex_subtype_from_string(PyTypeObject *type, PyObject *v) +complex_from_string_inner(const char *s, Py_ssize_t len, void *type) {
- const char *s, *start;
- char *end; double x=0.0, y=0.0, z; int got_bracket=0;
- PyObject *s_buffer = NULL;
- Py_ssize_t len;
- if (PyUnicode_Check(v)) {
s_buffer = _PyUnicode_TransformDecimalAndSpaceToASCII(v);[](#l17.18)
if (s_buffer == NULL)[](#l17.19)
return NULL;[](#l17.20)
s = PyUnicode_AsUTF8AndSize(s_buffer, &len);[](#l17.21)
if (s == NULL)[](#l17.22)
goto error;[](#l17.23)
- }
- else {
PyErr_Format(PyExc_TypeError,[](#l17.26)
"complex() argument must be a string or a number, not '%.200s'",[](#l17.27)
Py_TYPE(v)->tp_name);[](#l17.28)
return NULL;[](#l17.29)
- }
/* position on first nonblank */ start = s; @@ -822,7 +805,7 @@ complex_subtype_from_string(PyTypeObject if (PyErr_ExceptionMatches(PyExc_ValueError)) PyErr_Clear(); else
goto error;[](#l17.40)
@@ -835,7 +818,7 @@ complex_subtype_from_string(PyTypeObject if (PyErr_ExceptionMatches(PyExc_ValueError)) PyErr_Clear(); else
goto error;[](#l17.49)
return NULL;[](#l17.50) }[](#l17.51) if (end != s)[](#l17.52) /* <float><signed-float>j */[](#l17.53)
@@ -890,18 +873,46 @@ complex_subtype_from_string(PyTypeObject if (s-start != len) goto parse_error;
parse_error: PyErr_SetString(PyExc_ValueError, "complex() arg is a malformed string");
static PyObject * +complex_subtype_from_string(PyTypeObject *type, PyObject *v) +{
- if (PyUnicode_Check(v)) {
s_buffer = _PyUnicode_TransformDecimalAndSpaceToASCII(v);[](#l17.78)
if (s_buffer == NULL) {[](#l17.79)
return NULL;[](#l17.80)
}[](#l17.81)
s = PyUnicode_AsUTF8AndSize(s_buffer, &len);[](#l17.82)
if (s == NULL) {[](#l17.83)
goto exit;[](#l17.84)
}[](#l17.85)
- }
- else {
PyErr_Format(PyExc_TypeError,[](#l17.88)
"complex() argument must be a string or a number, not '%.200s'",[](#l17.89)
Py_TYPE(v)->tp_name);[](#l17.90)
return NULL;[](#l17.91)
- }
- result = _Py_string_to_number_with_underscores(s, len, "complex", v, type,
complex_from_string_inner);[](#l17.95)
- exit:
- Py_DECREF(s_buffer);
- return result;
+} + +static PyObject * complex_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { PyObject *r, *i, *tmp;
--- a/Objects/floatobject.c +++ b/Objects/floatobject.c @@ -124,11 +124,43 @@ PyFloat_FromDouble(double fval) return (PyObject *) op; } +static PyObject * +float_from_string_inner(const char *s, Py_ssize_t len, void *obj) +{
- double x;
- const char *end;
- const char *last = s + len;
- /* strip space */
- while (s < last && Py_ISSPACE(*s)) {
s++;[](#l18.15)
- }
- /* We don't care about overflow or underflow. If the platform
* supports them, infinities and signed zeroes (on underflow) are[](#l18.23)
* fine. */[](#l18.24)
- x = PyOS_string_to_double(s, (char **)&end, NULL);
- if (end != last) {
PyErr_Format(PyExc_ValueError,[](#l18.27)
"could not convert string to float: "[](#l18.28)
"%R", obj);[](#l18.29)
return NULL;[](#l18.30)
- }
- else if (x == -1.0 && PyErr_Occurred()) {
return NULL;[](#l18.33)
- }
- else {
return PyFloat_FromDouble(x);[](#l18.36)
- }
+} + PyObject * PyFloat_FromString(PyObject *v) {
- const char *s; PyObject *s_buffer = NULL; Py_ssize_t len; Py_buffer view = {NULL, NULL}; @@ -169,27 +201,8 @@ PyFloat_FromString(PyObject *v) Py_TYPE(v)->tp_name); return NULL; }
- last = s + len;
- /* strip space */
- while (s < last && Py_ISSPACE(*s))
s++;[](#l18.56)
- while (s < last - 1 && Py_ISSPACE(last[-1]))
last--;[](#l18.58)
- /* We don't care about overflow or underflow. If the platform
* supports them, infinities and signed zeroes (on underflow) are[](#l18.60)
* fine. */[](#l18.61)
- x = PyOS_string_to_double(s, (char **)&end, NULL);
- if (end != last) {
PyErr_Format(PyExc_ValueError,[](#l18.64)
"could not convert string to float: "[](#l18.65)
"%R", v);[](#l18.66)
result = NULL;[](#l18.67)
- }
- else if (x == -1.0 && PyErr_Occurred())
result = NULL;[](#l18.70)
- else
result = PyFloat_FromDouble(x);[](#l18.72)
- result = _Py_string_to_number_with_underscores(s, len, "float", v, v,
PyBuffer_Release(&view); Py_XDECREF(s_buffer); return result;float_from_string_inner);[](#l18.75)
--- a/Objects/longobject.c +++ b/Objects/longobject.c @@ -2004,12 +2004,18 @@ unsigned char _PyLong_DigitValue[256] =
- non-digit (which may be *str!). A normalized int is returned.
- The point to this routine is that it takes time linear in the number of
- string characters.
- *
- char prev = 0;
- int digits = 0; int bits_per_char; Py_ssize_t n; PyLongObject *z; @@ -2019,23 +2025,43 @@ long_from_binary_base(const char **str, assert(base >= 2 && base <= 32 && (base & (base - 1)) == 0); n = base;
- /* n <- total # of bits needed, while setting p to end-of-string */
- while (_PyLong_DigitValue[Py_CHARMASK(*p)] < base)
- }
- /* count digits and set p to end-of-string */
- while (PyLong_DigitValue[Py_CHARMASK(*p)] < base || *p == '') {
if (*p == '_') {[](#l19.36)
if (prev == '_') {[](#l19.37)
*str = p - 1;[](#l19.38)
return -1;[](#l19.39)
}[](#l19.40)
} else {[](#l19.41)
++digits;[](#l19.42)
}[](#l19.43)
prev = *p;[](#l19.44) ++p;[](#l19.45)
- }
- if (prev == '_') {
/* Trailing underscore not allowed. */[](#l19.48)
*str = p - 1;[](#l19.49)
return -1;[](#l19.50)
- }
+ str = p; / n <- # of Python digits needed, = ceiling(n/PyLong_SHIFT). */
- n = digits * bits_per_char + PyLong_SHIFT - 1; if (n / bits_per_char < p - start) { PyErr_SetString(PyExc_ValueError, "int string too large to convert");
return NULL;[](#l19.60)
- if (z == NULL) {
*res = NULL;[](#l19.69)
return 0;[](#l19.70)
- } /* Read string from right, and fill in int from left; i.e.,
int k = (int)_PyLong_DigitValue[Py_CHARMASK(*p)];[](#l19.79)
int k;[](#l19.80)
if (*p == '_') {[](#l19.81)
continue;[](#l19.82)
}[](#l19.83)
k = (int)_PyLong_DigitValue[Py_CHARMASK(*p)];[](#l19.84) assert(k >= 0 && k < base);[](#l19.85) accum |= (twodigits)k << bits_in_accum;[](#l19.86) bits_in_accum += bits_per_char;[](#l19.87)
@@ -2062,7 +2092,8 @@ long_from_binary_base(const char **str, } while (pdigit - z->ob_digit < n) *pdigit++ = 0;
} /* Parses an int from a bytestring. Leading and trailing whitespace will be @@ -2087,23 +2118,29 @@ PyLong_FromString(const char *str, char "int() arg 2 must be >= 2 and <= 36"); return NULL; }
if (str[0] != '0')[](#l19.115)
if (str[0] != '0') {[](#l19.116) base = 10;[](#l19.117)
else if (str[1] == 'x' || str[1] == 'X')[](#l19.118)
}[](#l19.119)
else if (str[1] == 'x' || str[1] == 'X') {[](#l19.120) base = 16;[](#l19.121)
else if (str[1] == 'o' || str[1] == 'O')[](#l19.122)
}[](#l19.123)
else if (str[1] == 'o' || str[1] == 'O') {[](#l19.124) base = 8;[](#l19.125)
else if (str[1] == 'b' || str[1] == 'B')[](#l19.126)
}[](#l19.127)
else if (str[1] == 'b' || str[1] == 'B') {[](#l19.128) base = 2;[](#l19.129)
}[](#l19.130) else {[](#l19.131) /* "old" (C-style) octal literal, now invalid.[](#l19.132) it might still be zero though */[](#l19.133)
@@ -2114,12 +2151,26 @@ PyLong_FromString(const char *str, char if (str[0] == '0' && ((base == 16 && (str[1] == 'x' || str[1] == 'X')) || (base == 8 && (str[1] == 'o' || str[1] == 'O')) ||
(base == 2 && (str[1] == 'b' || str[1] == 'B'))))[](#l19.138)
(base == 2 && (str[1] == 'b' || str[1] == 'B')))) {[](#l19.139) str += 2;[](#l19.140)
/* One underscore allowed here. */[](#l19.141)
if (*str == '_') {[](#l19.142)
++str;[](#l19.143)
}[](#l19.144)
- }
- if (str[0] == '_') {
/* May not start with underscores. */[](#l19.147)
goto onError;[](#l19.148)
- }
- if ((base & (base - 1)) == 0) {
int res = long_from_binary_base(&str, base, &z);[](#l19.155)
if (res < 0) {[](#l19.156)
/* Syntax error. */[](#l19.157)
goto onError;[](#l19.158)
}[](#l19.159)
- } else { /*** Binary bases can be converted in time linear in the number of digits, because @@ -2208,11 +2259,13 @@ digit beyond the first. ***/ twodigits c; /* current input character */ Py_ssize_t size_z;
int digits = 0;[](#l19.168) int i;[](#l19.169) int convwidth;[](#l19.170) twodigits convmultmax, convmult;[](#l19.171) digit *pz, *pzstop;[](#l19.172)
const char* scan;[](#l19.173)
const char *scan, *lastdigit;[](#l19.174)
char prev = 0;[](#l19.175)
static double log_base_BASE[37] = {0.0e0,}; static int convwidth_base[37] = {0,}; @@ -2226,8 +2279,9 @@ digit beyond the first. log((double)PyLong_BASE)); for (;;) { twodigits next = convmax * base;
if (next > PyLong_BASE)[](#l19.183)
if (next > PyLong_BASE) {[](#l19.184) break;[](#l19.185)
}[](#l19.186) convmax = next;[](#l19.187) ++i;[](#l19.188) }[](#l19.189)
@@ -2238,21 +2292,43 @@ digit beyond the first. /* Find length of the string of numeric characters. */ scan = str;
while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base)[](#l19.194)
lastdigit = str;[](#l19.195)
while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base || *scan == '_') {[](#l19.197)
if (*scan == '_') {[](#l19.198)
if (prev == '_') {[](#l19.199)
/* Only one underscore allowed. */[](#l19.200)
str = lastdigit + 1;[](#l19.201)
goto onError;[](#l19.202)
}[](#l19.203)
}[](#l19.204)
else {[](#l19.205)
++digits;[](#l19.206)
lastdigit = scan;[](#l19.207)
}[](#l19.208)
prev = *scan;[](#l19.209) ++scan;[](#l19.210)
}[](#l19.211)
if (prev == '_') {[](#l19.212)
/* Trailing underscore not allowed. */[](#l19.213)
/* Set error pointer to first underscore. */[](#l19.214)
str = lastdigit + 1;[](#l19.215)
goto onError;[](#l19.216)
}[](#l19.217)
/* Create an int object that can contain the largest possible * integer with this base and length. Note that there's no * need to initialize z->ob_digit -- no slot is read up before * being stored into. */
size_z = (Py_ssize_t)((scan - str) * log_base_BASE[base]) + 1;[](#l19.224)
size_z = (Py_ssize_t)(digits * log_base_BASE[base]) + 1;[](#l19.225) /* Uncomment next line to test exceedingly rare copy code */[](#l19.226) /* size_z = 1; */[](#l19.227) assert(size_z > 0);[](#l19.228) z = _PyLong_New(size_z);[](#l19.229)
if (z == NULL)[](#l19.230)
if (z == NULL) {[](#l19.231) return NULL;[](#l19.232)
}[](#l19.233) Py_SIZE(z) = 0;[](#l19.234)
/* convwidth
consecutive input digits are treated as a single
@@ -2263,9 +2339,17 @@ digit beyond the first.
/* Work ;-) */
while (str < scan) {
if (*str == '_') {[](#l19.241)
str++;[](#l19.242)
continue;[](#l19.243)
}[](#l19.244) /* grab up to convwidth digits from the input string */[](#l19.245) c = (digit)_PyLong_DigitValue[Py_CHARMASK(*str++)];[](#l19.246)
for (i = 1; i < convwidth && str != scan; ++i, ++str) {[](#l19.247)
for (i = 1; i < convwidth && str != scan; ++str) {[](#l19.248)
if (*str == '_') {[](#l19.249)
continue;[](#l19.250)
}[](#l19.251)
i++;[](#l19.252) c = (twodigits)(c * base +[](#l19.253) (int)_PyLong_DigitValue[Py_CHARMASK(*str)]);[](#l19.254) assert(c < PyLong_BASE);[](#l19.255)
@@ -2277,8 +2361,9 @@ digit beyond the first. */ if (i != convwidth) { convmult = base;
for ( ; i > 1; --i)[](#l19.260)
for ( ; i > 1; --i) {[](#l19.261) convmult *= base;[](#l19.262)
}[](#l19.263) }[](#l19.264)
/* Multiply z by convmult, and add c. */ @@ -2316,41 +2401,51 @@ digit beyond the first. } } }
- if (z == NULL) { return NULL;
- } if (error_if_nonzero) { /* reset the base to 0, else the exception message doesn't make too much sense */ base = 0;
if (Py_SIZE(z) != 0)[](#l19.279)
if (Py_SIZE(z) != 0) {[](#l19.280) goto onError;[](#l19.281)
}}[](#l19.282) /* there might still be other problems, therefore base[](#l19.283) remains zero here for the same reason */[](#l19.284)
- if (pend != NULL) { *pend = (char *)str;
- } Py_XDECREF(z); slen = strlen(orig_str) < 200 ? strlen(orig_str) : 200; strobj = PyUnicode_FromStringAndSize(orig_str, slen);
- if (strobj == NULL) { return NULL;
- } PyErr_Format(PyExc_ValueError, "invalid literal for int() with base %d: %.200R", base, strobj);
--- a/Parser/tokenizer.c +++ b/Parser/tokenizer.c @@ -1333,6 +1333,28 @@ verify_identifier(struct tok_state *tok) } #endif +static int +tok_decimal_tail(struct tok_state *tok) +{
- while (1) {
do {[](#l20.13)
c = tok_nextc(tok);[](#l20.14)
} while (isdigit(c));[](#l20.15)
if (c != '_') {[](#l20.16)
break;[](#l20.17)
}[](#l20.18)
c = tok_nextc(tok);[](#l20.19)
if (!isdigit(c)) {[](#l20.20)
tok->done = E_TOKEN;[](#l20.21)
tok_backup(tok, c);[](#l20.22)
return 0;[](#l20.23)
}[](#l20.24)
- }
- return c;
+} + /* Get next token, after space stripping etc. */ static int @@ -1353,17 +1375,20 @@ tok_get(struct tok_state *tok, char **p_ tok->atbol = 0; for (;;) { c = tok_nextc(tok);
if (c == ' ')[](#l20.36)
if (c == ' ') {[](#l20.37) col++, altcol++;[](#l20.38)
}[](#l20.39) else if (c == '\t') {[](#l20.40) col = (col/tok->tabsize + 1) * tok->tabsize;[](#l20.41) altcol = (altcol/tok->alttabsize + 1)[](#l20.42) * tok->alttabsize;[](#l20.43) }[](#l20.44)
else if (c == '\014') /* Control-L (formfeed) */[](#l20.45)
else if (c == '\014') {/* Control-L (formfeed) */[](#l20.46) col = altcol = 0; /* For Emacs users */[](#l20.47)
else[](#l20.48)
}[](#l20.49)
else {[](#l20.50) break;[](#l20.51)
}[](#l20.52) }[](#l20.53) tok_backup(tok, c);[](#l20.54) if (c == '#' || c == '\n') {[](#l20.55)
@@ -1372,10 +1397,12 @@ tok_get(struct tok_state *tok, char **p_ not passed to the parser as NEWLINE tokens, except totally empty lines in interactive mode, which signal the end of a command group. */
if (col == 0 && c == '\n' && tok->prompt != NULL)[](#l20.60)
if (col == 0 && c == '\n' && tok->prompt != NULL) {[](#l20.61) blankline = 0; /* Let it through */[](#l20.62)
else[](#l20.63)
}[](#l20.64)
else {[](#l20.65) blankline = 1; /* Ignore completely */[](#l20.66)
}[](#l20.67) /* We can't jump back right here since we still[](#l20.68) may need to skip to the end of a comment */[](#l20.69) }[](#l20.70)
@@ -1383,8 +1410,9 @@ tok_get(struct tok_state *tok, char *p_ if (col == tok->indstack[tok->indent]) { / No change */ if (altcol != tok->altindstack[tok->indent]) {
if (indenterror(tok))[](#l20.75)
if (indenterror(tok)) {[](#l20.76) return ERRORTOKEN;[](#l20.77)
}[](#l20.78) }[](#l20.79) }[](#l20.80) else if (col > tok->indstack[tok->indent]) {[](#l20.81)
@@ -1395,8 +1423,9 @@ tok_get(struct tok_state *tok, char **p_ return ERRORTOKEN; } if (altcol <= tok->altindstack[tok->indent]) {
if (indenterror(tok))[](#l20.86)
if (indenterror(tok)) {[](#l20.87) return ERRORTOKEN;[](#l20.88)
}[](#l20.89) }[](#l20.90) tok->pendin++;[](#l20.91) tok->indstack[++tok->indent] = col;[](#l20.92)
@@ -1415,8 +1444,9 @@ tok_get(struct tok_state *tok, char **p_ return ERRORTOKEN; } if (altcol != tok->altindstack[tok->indent]) {
if (indenterror(tok))[](#l20.97)
if (indenterror(tok)) {[](#l20.98) return ERRORTOKEN;[](#l20.99)
}[](#l20.100) }[](#l20.101) }[](#l20.102) }[](#l20.103)
@@ -1462,9 +1492,11 @@ tok_get(struct tok_state *tok, char *p_ tok->start = tok->cur - 1; / Skip comment */
- if (c == '#') {
while (c != EOF && c != '\n') {[](#l20.111) c = tok_nextc(tok);[](#l20.112)
}[](#l20.113)
- }
/* Check for EOF and errors now */ if (c == EOF) { @@ -1481,27 +1513,35 @@ tok_get(struct tok_state *tok, char *p_ saw_b = 1; / Since this is a backwards compatibility support literal we don't want to support it in arbitrary order like byte literals. */
else if (!(saw_b || saw_u || saw_r || saw_f) && (c == 'u' || c == 'U'))[](#l20.122)
else if (!(saw_b || saw_u || saw_r || saw_f)[](#l20.123)
&& (c == 'u'|| c == 'U')) {[](#l20.124) saw_u = 1;[](#l20.125)
}[](#l20.126) /* ur"" and ru"" are not supported */[](#l20.127)
else if (!(saw_r || saw_u) && (c == 'r' || c == 'R'))[](#l20.128)
else if (!(saw_r || saw_u) && (c == 'r' || c == 'R')) {[](#l20.129) saw_r = 1;[](#l20.130)
else if (!(saw_f || saw_b || saw_u) && (c == 'f' || c == 'F'))[](#l20.131)
}[](#l20.132)
else if (!(saw_f || saw_b || saw_u) && (c == 'f' || c == 'F')) {[](#l20.133) saw_f = 1;[](#l20.134)
else[](#l20.135)
}[](#l20.136)
else {[](#l20.137) break;[](#l20.138)
}[](#l20.139) c = tok_nextc(tok);[](#l20.140)
if (c == '"' || c == '\'')[](#l20.141)
if (c == '"' || c == '\'') {[](#l20.142) goto letter_quote;[](#l20.143)
}[](#l20.144) }[](#l20.145) while (is_potential_identifier_char(c)) {[](#l20.146)
if (c >= 128)[](#l20.147)
if (c >= 128) {[](#l20.148) nonascii = 1;[](#l20.149)
}[](#l20.150) c = tok_nextc(tok);[](#l20.151) }[](#l20.152) tok_backup(tok, c);[](#l20.153)
if (nonascii && !verify_identifier(tok))[](#l20.154)
if (nonascii && !verify_identifier(tok)) {[](#l20.155) return ERRORTOKEN;[](#l20.156)
}[](#l20.157) *p_start = tok->start;[](#l20.158) *p_end = tok->cur;[](#l20.159)
@@ -1510,10 +1550,12 @@ tok_get(struct tok_state *tok, char *p_ / Current token length is 5. / if (tok->async_def) { / We're inside an 'async def' function. */
if (memcmp(tok->start, "async", 5) == 0)[](#l20.165)
if (memcmp(tok->start, "async", 5) == 0) {[](#l20.166) return ASYNC;[](#l20.167)
if (memcmp(tok->start, "await", 5) == 0)[](#l20.168)
}[](#l20.169)
if (memcmp(tok->start, "await", 5) == 0) {[](#l20.170) return AWAIT;[](#l20.171)
}[](#l20.172) }[](#l20.173) else if (memcmp(tok->start, "async", 5) == 0) {[](#l20.174) /* The current token is 'async'.[](#l20.175)
@@ -1546,8 +1588,9 @@ tok_get(struct tok_state *tok, char *p_ / Newline */ if (c == '\n') { tok->atbol = 1;
if (blankline || tok->level > 0)[](#l20.180)
if (blankline || tok->level > 0) {[](#l20.181) goto nextline;[](#l20.182)
}[](#l20.183) *p_start = tok->start;[](#l20.184) *p_end = tok->cur - 1; /* Leave '\n' out of the string */[](#l20.185) tok->cont_line = 0;[](#l20.186)
@@ -1570,11 +1613,13 @@ tok_get(struct tok_state *tok, char **p_ *p_start = tok->start; *p_end = tok->cur; return ELLIPSIS;
} else {[](#l20.191)
}[](#l20.192)
else {[](#l20.193) tok_backup(tok, c);[](#l20.194) }[](#l20.195) tok_backup(tok, '.');[](#l20.196)
} else {[](#l20.197)
}[](#l20.198)
else {[](#l20.199) tok_backup(tok, c);[](#l20.200) }[](#l20.201) *p_start = tok->start;[](#l20.202)
@@ -1588,59 +1633,93 @@ tok_get(struct tok_state *tok, char *p_ / Hex, octal or binary -- maybe. / c = tok_nextc(tok); if (c == 'x' || c == 'X') { - / Hex */ c = tok_nextc(tok);
if (!isxdigit(c)) {[](#l20.210)
tok->done = E_TOKEN;[](#l20.211)
tok_backup(tok, c);[](#l20.212)
return ERRORTOKEN;[](#l20.213)
}[](#l20.214) do {[](#l20.215)
c = tok_nextc(tok);[](#l20.216)
} while (isxdigit(c));[](#l20.217)
if (c == '_') {[](#l20.218)
c = tok_nextc(tok);[](#l20.219)
}[](#l20.220)
if (!isxdigit(c)) {[](#l20.221)
tok->done = E_TOKEN;[](#l20.222)
tok_backup(tok, c);[](#l20.223)
return ERRORTOKEN;[](#l20.224)
}[](#l20.225)
do {[](#l20.226)
c = tok_nextc(tok);[](#l20.227)
} while (isxdigit(c));[](#l20.228)
} while (c == '_');[](#l20.229) }[](#l20.230) else if (c == 'o' || c == 'O') {[](#l20.231) /* Octal */[](#l20.232) c = tok_nextc(tok);[](#l20.233)
if (c < '0' || c >= '8') {[](#l20.234)
tok->done = E_TOKEN;[](#l20.235)
tok_backup(tok, c);[](#l20.236)
return ERRORTOKEN;[](#l20.237)
}[](#l20.238) do {[](#l20.239)
c = tok_nextc(tok);[](#l20.240)
} while ('0' <= c && c < '8');[](#l20.241)
if (c == '_') {[](#l20.242)
c = tok_nextc(tok);[](#l20.243)
}[](#l20.244)
if (c < '0' || c >= '8') {[](#l20.245)
tok->done = E_TOKEN;[](#l20.246)
tok_backup(tok, c);[](#l20.247)
return ERRORTOKEN;[](#l20.248)
}[](#l20.249)
do {[](#l20.250)
c = tok_nextc(tok);[](#l20.251)
} while ('0' <= c && c < '8');[](#l20.252)
} while (c == '_');[](#l20.253) }[](#l20.254) else if (c == 'b' || c == 'B') {[](#l20.255) /* Binary */[](#l20.256) c = tok_nextc(tok);[](#l20.257)
if (c != '0' && c != '1') {[](#l20.258)
tok->done = E_TOKEN;[](#l20.259)
tok_backup(tok, c);[](#l20.260)
return ERRORTOKEN;[](#l20.261)
}[](#l20.262) do {[](#l20.263)
c = tok_nextc(tok);[](#l20.264)
} while (c == '0' || c == '1');[](#l20.265)
if (c == '_') {[](#l20.266)
c = tok_nextc(tok);[](#l20.267)
}[](#l20.268)
if (c != '0' && c != '1') {[](#l20.269)
tok->done = E_TOKEN;[](#l20.270)
tok_backup(tok, c);[](#l20.271)
return ERRORTOKEN;[](#l20.272)
}[](#l20.273)
do {[](#l20.274)
c = tok_nextc(tok);[](#l20.275)
} while (c == '0' || c == '1');[](#l20.276)
} while (c == '_');[](#l20.277) }[](#l20.278) else {[](#l20.279) int nonzero = 0;[](#l20.280) /* maybe old-style octal; c is first char of it */[](#l20.281) /* in any case, allow '0' as a literal */[](#l20.282)
while (c == '0')[](#l20.283)
c = tok_nextc(tok);[](#l20.284)
while (isdigit(c)) {[](#l20.285)
nonzero = 1;[](#l20.286)
while (1) {[](#l20.287)
if (c == '_') {[](#l20.288)
c = tok_nextc(tok);[](#l20.289)
if (!isdigit(c)) {[](#l20.290)
tok->done = E_TOKEN;[](#l20.291)
tok_backup(tok, c);[](#l20.292)
return ERRORTOKEN;[](#l20.293)
}[](#l20.294)
}[](#l20.295)
if (c != '0') {[](#l20.296)
break;[](#l20.297)
}[](#l20.298) c = tok_nextc(tok);[](#l20.299) }[](#l20.300)
if (c == '.')[](#l20.301)
if (isdigit(c)) {[](#l20.302)
nonzero = 1;[](#l20.303)
c = tok_decimal_tail(tok);[](#l20.304)
if (c == 0) {[](#l20.305)
return ERRORTOKEN;[](#l20.306)
}[](#l20.307)
}[](#l20.308)
if (c == '.') {[](#l20.309)
c = tok_nextc(tok);[](#l20.310) goto fraction;[](#l20.311)
else if (c == 'e' || c == 'E')[](#l20.312)
}[](#l20.313)
else if (c == 'e' || c == 'E') {[](#l20.314) goto exponent;[](#l20.315)
else if (c == 'j' || c == 'J')[](#l20.316)
}[](#l20.317)
else if (c == 'j' || c == 'J') {[](#l20.318) goto imaginary;[](#l20.319)
}[](#l20.320) else if (nonzero) {[](#l20.321)
/* Old-style octal: now disallowed. */[](#l20.322) tok->done = E_TOKEN;[](#l20.323) tok_backup(tok, c);[](#l20.324) return ERRORTOKEN;[](#l20.325)
@@ -1649,17 +1728,22 @@ tok_get(struct tok_state *tok, char *p_ } else { / Decimal */
do {[](#l20.330)
c = tok_nextc(tok);[](#l20.331)
} while (isdigit(c));[](#l20.332)
c = tok_decimal_tail(tok);[](#l20.333)
if (c == 0) {[](#l20.334)
return ERRORTOKEN;[](#l20.335)
}[](#l20.336) {[](#l20.337) /* Accept floating point numbers. */[](#l20.338) if (c == '.') {[](#l20.339)
c = tok_nextc(tok);[](#l20.340) fraction:[](#l20.341) /* Fraction */[](#l20.342)
do {[](#l20.343)
c = tok_nextc(tok);[](#l20.344)
} while (isdigit(c));[](#l20.345)
if (isdigit(c)) {[](#l20.346)
c = tok_decimal_tail(tok);[](#l20.347)
if (c == 0) {[](#l20.348)
return ERRORTOKEN;[](#l20.349)
}[](#l20.350)
}[](#l20.351) }[](#l20.352) if (c == 'e' || c == 'E') {[](#l20.353) int e;[](#l20.354)
@@ -1681,14 +1765,16 @@ tok_get(struct tok_state *tok, char **p_ *p_end = tok->cur; return NUMBER; }
do {[](#l20.359)
c = tok_nextc(tok);[](#l20.360)
} while (isdigit(c));[](#l20.361)
c = tok_decimal_tail(tok);[](#l20.362)
if (c == 0) {[](#l20.363)
return ERRORTOKEN;[](#l20.364)
}[](#l20.365) }[](#l20.366)
if (c == 'j' || c == 'J')[](#l20.367)
if (c == 'j' || c == 'J') {[](#l20.368) /* Imaginary part */[](#l20.369) imaginary:[](#l20.370) c = tok_nextc(tok);[](#l20.371)
}[](#l20.372) }[](#l20.373) }[](#l20.374) tok_backup(tok, c);[](#l20.375)
@@ -1708,22 +1794,27 @@ tok_get(struct tok_state *tok, char **p_ c = tok_nextc(tok); if (c == quote) { c = tok_nextc(tok);
if (c == quote)[](#l20.380)
if (c == quote) {[](#l20.381) quote_size = 3;[](#l20.382)
else[](#l20.383)
}[](#l20.384)
else {[](#l20.385) end_quote_size = 1; /* empty string found */[](#l20.386)
}[](#l20.387) }[](#l20.388)
if (c != quote)[](#l20.389)
if (c != quote) {[](#l20.390) tok_backup(tok, c);[](#l20.391)
}[](#l20.392)
/* Get rest of string */ while (end_quote_size != quote_size) { c = tok_nextc(tok); if (c == EOF) {
if (quote_size == 3)[](#l20.398)
if (quote_size == 3) {[](#l20.399) tok->done = E_EOFS;[](#l20.400)
else[](#l20.401)
}[](#l20.402)
else {[](#l20.403) tok->done = E_EOLS;[](#l20.404)
}[](#l20.405) tok->cur = tok->inp;[](#l20.406) return ERRORTOKEN;[](#l20.407) }[](#l20.408)
@@ -1732,12 +1823,14 @@ tok_get(struct tok_state *tok, char **p_ tok->cur = tok->inp; return ERRORTOKEN; }
if (c == quote)[](#l20.413)
if (c == quote) {[](#l20.414) end_quote_size += 1;[](#l20.415)
}[](#l20.416) else {[](#l20.417) end_quote_size = 0;[](#l20.418)
if (c == '\\')[](#l20.419)
if (c == '\\') {[](#l20.420) tok_nextc(tok); /* skip escaped char */[](#l20.421)
}[](#l20.422) }[](#l20.423) }[](#l20.424)
@@ -1767,7 +1860,8 @@ tok_get(struct tok_state *tok, char **p_ int token3 = PyToken_ThreeChars(c, c2, c3); if (token3 != OP) { token = token3;
} else {[](#l20.430)
}[](#l20.431)
else {[](#l20.432) tok_backup(tok, c3);[](#l20.433) }[](#l20.434) *p_start = tok->start;[](#l20.435)
--- a/Python/ast.c +++ b/Python/ast.c @@ -4018,7 +4018,7 @@ ast_for_stmt(struct compiling *c, const } static PyObject * -parsenumber(struct compiling *c, const char *s) +parsenumber_raw(struct compiling *c, const char *s) { const char *end; long x; @@ -4061,6 +4061,31 @@ parsenumber(struct compiling *c, const c } static PyObject * +parsenumber(struct compiling *c, const char *s) +{
- if (strchr(s, '_') == NULL) {
return parsenumber_raw(c, s);[](#l21.24)
- }
- /* Create a duplicate without underscores. */
- dup = PyMem_Malloc(strlen(s) + 1);
- end = dup;
- for (; *s; s++) {
if (*s != '_') {[](#l21.30)
*end++ = *s;[](#l21.31)
}[](#l21.32)
- }
- *end = '\0';
- res = parsenumber_raw(c, dup);
- PyMem_Free(dup);
- return res;
+} + +static PyObject * decode_utf8(struct compiling *c, const char **sPtr, const char *end) { const char *s, *t;
--- a/Python/pystrtod.c +++ b/Python/pystrtod.c @@ -370,6 +370,72 @@ PyOS_string_to_double(const char s, return result; } +/ Remove underscores that follow the underscore placement rule from
- the string and then call the
innerfunc
function on the result. - It should return a new object or NULL on exception. +
what
is used for the error message emitted when underscores are detected- that don't follow the rule.
arg
is an opaque pointer passed to the inner - function. +
- This is used to implement underscore-agnostic conversion for floats
- and complex numbers. +*/ +PyObject * +_Py_string_to_number_with_underscores(
- const char *s, Py_ssize_t orig_len, const char *what, PyObject *obj, void *arg,
- PyObject *(*innerfunc)(const char *, Py_ssize_t, void *))
- dup = PyMem_Malloc(orig_len + 1);
- end = dup;
- prev = '\0';
- last = s + orig_len;
- for (p = s; *p; p++) {
if (*p == '_') {[](#l22.37)
/* Underscores are only allowed after digits. */[](#l22.38)
if (!(prev >= '0' && prev <= '9')) {[](#l22.39)
goto error;[](#l22.40)
}[](#l22.41)
}[](#l22.42)
else {[](#l22.43)
*end++ = *p;[](#l22.44)
/* Underscores are only allowed before digits. */[](#l22.45)
if (prev == '_' && !(*p >= '0' && *p <= '9')) {[](#l22.46)
goto error;[](#l22.47)
}[](#l22.48)
}[](#l22.49)
prev = *p;[](#l22.50)
- }
- /* Underscores are not allowed at the end. */
- if (prev == '_') {
goto error;[](#l22.54)
- }
- /* No embedded NULs allowed. */
- if (p != last) {
goto error;[](#l22.58)
- }
- *end = '\0';
- result = innerfunc(dup, end - dup, arg);
- PyMem_Free(dup);
- return result;
- error:
- PyMem_Free(dup);
- PyErr_Format(PyExc_ValueError,
"could not convert string to %s: "[](#l22.68)
"%R", what, obj);[](#l22.69)
- return NULL;
+} + #ifdef PY_NO_SHORT_FLOAT_REPR /* Given a string that may have a decimal point in the current