parser/lexer: bump to Unicode 17, use faster unicode-ident by Marcondiro · Pull Request #148321 · rust-lang/rust (original) (raw)
Gonna preface this with: my main reason for having this discussion in the first place is, as I mostly mentioned, I think that bumping the Unicode version, since it is a routine change, should have some sort of playbook so we don't need to have these discussions again. And, so, I'm having these discussions here because I think the formation of said playbook is relevant, but if you'd rather move it somewhere else, I'm fine doing that too.
Note that this is mostly because unicode_rs crates get bumped when someone wants them to be. If Rust wanted them to be bumped, they can be bumped (best way: make a PR with the bump).
Not that this is the place to put this, but the exact reason why I submitted unicode-rs/unicode-script#25 (which you merged) is because Clippy uses unicode-script and it doesn't have a published version for Unicode 17, so, an attempt to enforce that all the Unicode versions match for clippy specifically fail. I imagine, then, if we were to enforce all of this, the procedure would then be to ping all of the authors of these crates, ensure that they're all updated to the latest version of Unicode, then perform the bump for all R-L crates? While this is also a request to do so for this particular crate, I'm also using it as an example since this would be part of what people should do going forward.
Right now, tidy only allows checking particular crates in Cargo.lock, and not explicit versions of them, so, we can't particularly enforce only versions which support the right Unicode version. This is a potential side concern of verifying the Unicode version is consistent: even though we can check the UNICODE_VERSION const for everything we use, we can't verify that dependencies also use the same version, which, even if it doesn't matter that much, is still probably better for the sake of reducing table size in binaries. So, I guess, that's also a side note here. I'm not sure exactly what the policy is for allowing dependencies in the compiler, but I assume that at least part of it is some assurance that the people who maintain them are at least active enough to merge changes if the compiler needs it.
There's also a potential consideration of just including the Unicode functionality most critical to the compiler directly in the compiler itself. Since all of the code used has a compatible license, it shouldn't be tremendously difficult to copy it in somewhere, and it would also mean that we could share some of the code that parses stuff like UnicodeData.txt between these various crates. This isn't to say it should be part of libstd, but maybe having it in the compiler to make version consistency easier is a consideration.