Allow integer suffixes starting with e
. by nnethercote · Pull Request #111628 · rust-lang/rust (original) (raw)
Integers with arbitrary suffixes are allowed as inputs to proc macros. A number of real-world crates use this capability in interesting ways, as seen in #103872. For example:
- Suffixes representing units, such as
8bits
,100px
,20ns
,30GB
- CSS hex colours such as
#7CFC00
(LawnGreen) - UUIDs, e.g.
785ada2c-f2d0-11fd-3839-b3104db0cb68
The hex cases may be surprising.
#7CFC00
is tokenized as a#
followed by a7
integer with aCFC00
suffix.785ada2c
is tokenized as a785
integer with anada2c
suffix.f2d0
is tokenized as an identifier.3839
is tokenized as an integer literal.
A proc macro will immediately stringify such tokens and reparse them itself, and so won't care that the token types vary. All suffixes must be consumed by the proc macro, of course; the only suffixes allowed after macro expansion are the numeric ones like u8
, i32
, and f64
.
Currently there is an annoying inconsistency in how integer literal suffixes are handled, which is that no suffix starting with e
is allowed, because that it interpreted as a float literal with an exponent. For example:
- Units:
1eV
and1em
- CSS colours:
#90EE90
(LightGreen) - UUIDs:
785ada2c-f2d0-11ed-3839-b3104db0cb68
In each case, a sequence of digits followed by an 'e' or 'E' followed by a letter results in an "expected at least one digit in exponent" error. This is an annoying inconsistency in general, and a problem in practice. It's likely that some users haven't realized this inconsistency because they've gotten lucky and never used a token with an 'e' that causes problems. Other users have noticed; it's causing problems when embedding DSLs into proc macros, as seen in #111615, where the CSS colours case is causing problems for two different UI frameworks (Slint and Makepad).
We can do better. This commit changes the lexer so that, when it hits a possible exponent, it looks ahead and only produces an exponent if a valid one is present. Otherwise, it produces a non-exponent form, which may be a single token (e.g. 1eV
) or multiple tokens (e.g. 1e+a
).
Consequences of this:
- All the proc macro problem cases mentioned above are fixed.
- The "expected at least one digit in exponent" error is no longer possible. A few tests that only worked in the presence of that error have been removed.
- The lexer requires unbounded lookahead due to the presence of '_' chars in exponents. E.g. to distinguish
1e+_______3
(a float literal with exponent) from1e+_______a
(previously invalid, but now the tokenised as1e
,+
,_______a
).
This is a backwards compatible language change: all existing valid programs will be treated in the same way, and some previously invalid programs will become valid. The tokens chapter of the language reference (https://doc.rust-lang.org/reference/tokens.html) will need changing to account for this. In particular, the "Reserved forms similar to number literals" section will need updating, and grammar rules involving the SUFFIX_NO_E nonterminal will need adjusting.
Fixes #111615.
r? @ghost