Allow integer suffixes starting with `e`. by nnethercote · Pull Request #111628 · rust-lang/rust (original) (raw)

Integers with arbitrary suffixes are allowed as inputs to proc macros. A number of real-world crates use this capability in interesting ways, as seen in #103872. For example:

Suffixes representing units, such as 8bits, 100px, 20ns, 30GB
CSS hex colours such as #7CFC00 (LawnGreen)
UUIDs, e.g. 785ada2c-f2d0-11fd-3839-b3104db0cb68

The hex cases may be surprising.

#7CFC00 is tokenized as a # followed by a 7 integer with a CFC00 suffix.
785ada2c is tokenized as a 785 integer with an ada2c suffix.
f2d0 is tokenized as an identifier.
3839 is tokenized as an integer literal.

A proc macro will immediately stringify such tokens and reparse them itself, and so won't care that the token types vary. All suffixes must be consumed by the proc macro, of course; the only suffixes allowed after macro expansion are the numeric ones like u8, i32, and f64.

Currently there is an annoying inconsistency in how integer literal suffixes are handled, which is that no suffix starting with e is allowed, because that it interpreted as a float literal with an exponent. For example:

Units: 1eV and 1em
CSS colours: #90EE90 (LightGreen)
UUIDs: 785ada2c-f2d0-11ed-3839-b3104db0cb68

In each case, a sequence of digits followed by an 'e' or 'E' followed by a letter results in an "expected at least one digit in exponent" error. This is an annoying inconsistency in general, and a problem in practice. It's likely that some users haven't realized this inconsistency because they've gotten lucky and never used a token with an 'e' that causes problems. Other users have noticed; it's causing problems when embedding DSLs into proc macros, as seen in #111615, where the CSS colours case is causing problems for two different UI frameworks (Slint and Makepad).

We can do better. This commit changes the lexer so that, when it hits a possible exponent, it looks ahead and only produces an exponent if a valid one is present. Otherwise, it produces a non-exponent form, which may be a single token (e.g. 1eV) or multiple tokens (e.g. 1e+a).

Consequences of this:

All the proc macro problem cases mentioned above are fixed.
The "expected at least one digit in exponent" error is no longer possible. A few tests that only worked in the presence of that error have been removed.
The lexer requires unbounded lookahead due to the presence of '_' chars in exponents. E.g. to distinguish 1e+_______3 (a float literal with exponent) from 1e+_______a (previously invalid, but now the tokenised as 1e, +, _______a).

This is a backwards compatible language change: all existing valid programs will be treated in the same way, and some previously invalid programs will become valid. The tokens chapter of the language reference (https://doc.rust-lang.org/reference/tokens.html) will need changing to account for this. In particular, the "Reserved forms similar to number literals" section will need updating, and grammar rules involving the SUFFIX_NO_E nonterminal will need adjusting.

Fixes #111615.

r? @ghost

Allow integer suffixes starting with e. by nnethercote · Pull Request #111628 · rust-lang/rust (original) (raw)

Allow integer suffixes starting with `e`. by nnethercote · Pull Request #111628 · rust-lang/rust (original) (raw)