Introduce let...else
by camsteffen · Pull Request #87688 · rust-lang/rust (original) (raw)
It looks like the Span decoding function has worse codegen with this PR. As best as I can tell, this isn't readily "fixable" and seems not directly connected to this PR, but adjustments to the LocalKind Encode impl do seem to fix it (I can't explain why changes to encoding affect decoding). I can reproduce the regression locally with and without PGO, so it doesn't seem related to that either.
AFAICT, there are some pretty weird choices made by LLVM in the assembly for this function -- I've uploaded disassembly here. For example, we can see this diff near the top of the function:
mov rsi, qword ptr [r13 + 8]
mov rdi, qword ptr [r13 + 16]
cmp rdi, rsi
mov r15, qword ptr [rbp + 8]
mov rdi, qword ptr [rbp + 16]
mov rsi, rdi
sub rsi, r15
If we ignore the register renaming, the meaningful difference is the cmp is replaced with a mov and sub. However, there's no real reason for LLVM to do that. rsi is not used again before being overwritten I think, so this is just wasting a mov for no real reason. It seems like the majority of the changes to this function are similar "weird" small losses, not anything terribly material. There's also some extra stack spilling (note that the stack size increases by 32 bytes).
Replacing the LocalKind Encode impl with empty stubs or otherwise adjusting it seems to fix the codegen here. The LocalKind Encode impl is not invoked by any typical compilation (tested via inserting a panic!). I'm fairly certain now that the indirect effects on Span decoding are due to shuffling of CGU partitions for the rustc_metadata crate
I instrumented the CGU partitioning (-Zprint-mono-items=lazy) on the minimal change which replaces the Encode impl for LocalKind with an empty stub, and locally it looks like the only two crate which have changed are rustc_interface and rustc_metadata.
For rustc_metadata, if we ignore changes in which CGU gets which mono item, the only additions are the extra emit_enum/emit_enum_variant, etc. along with the closures for those:
@@ -3895,0 +3896 @@ +MONO_ITEM fn <rmeta::encoder::EncodeContext as rustc_serialize::Encoder>::emit_enum::<[closure@rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for > @@ -4311,0 +4313,3 @@ +MONO_ITEM fn <rmeta::encoder::EncodeContext as rustc_serialize::Encoder>::emit_enum_variant::<[closure@rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeConte +MONO_ITEM fn <rmeta::encoder::EncodeContext as rustc_serialize::Encoder>::emit_enum_variant::<[closure@rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeConte +MONO_ITEM fn <rmeta::encoder::EncodeContext as rustc_serialize::Encoder>::emit_enum_variant::<[closure@rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeConte @@ -5977,0 +5982,3 @@ +MONO_ITEM fn <rmeta::encoder::EncodeContext as rustc_serialize::Encoder>::emit_enum_variant_arg::<[closure@rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeC +MONO_ITEM fn <rmeta::encoder::EncodeContext as rustc_serialize::Encoder>::emit_enum_variant_arg::<[closure@rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeC +MONO_ITEM fn <rmeta::encoder::EncodeContext as rustc_serialize::Encoder>::emit_enum_variant_arg::<[closure@rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeC @@ -7817 +7823,0 @@ -MONO_ITEM fn <rustc_ast::LocalKind as rustc_serialize::Encodablermeta::encoder::EncodeContext>::encode @@ -19645,0 +19652,8 @@ +MONO_ITEM fn rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for rustc_ast::LocalKind>::encode +MONO_ITEM fn rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for rustc_ast::LocalKind>::encode::{closure#0} +MONO_ITEM fn rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for rustc_ast::LocalKind>::encode::{closure#0}::{closure#0} +MONO_ITEM fn rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for rustc_ast::LocalKind>::encode::{closure#0}::{closure#1} +MONO_ITEM fn rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for rustc_ast::LocalKind>::encode::{closure#0}::{closure#1}::{closure#0} +MONO_ITEM fn rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for rustc_ast::LocalKind>::encode::{closure#0}::{closure#2} +MONO_ITEM fn rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for rustc_ast::LocalKind>::encode::{closure#0}::{closure#2}::{closure#0} +MONO_ITEM fn rustc_ast::ast::_DERIVE_rustc_serialize_Encodable_E_FOR_LocalKind::<impl rustc_serialize::Encodablermeta::encoder::EncodeContext for rustc_ast::LocalKind>::encode::{closure#0}::{closure#2}::{closure#1}
This is pretty expected, since all of these are getting instantiated with rmeta::encoder::EncodeContext for the first time (it's defined in rustc_metadata). This all mostly comes about from a pretty deep callstack, rooted in attribute encoding (since those contain arbitrary expressions and paths, we get essentially all AST types serialize and deserialize impl's codegened).
The partitioning has changed for CGUs 0, 3, and 4.
8622 after-with-empty-encode/rustc-metadata/rustc_metadata.f82b8cc0-cgu.0
2824 after-with-empty-encode/rustc-metadata/rustc_metadata.f82b8cc0-cgu.3
6021 after-with-empty-encode/rustc-metadata/rustc_metadata.f82b8cc0-cgu.4
8634 after-mono/rustc-metadata/rustc_metadata.f82b8cc0-cgu.0
4466 after-mono/rustc-metadata/rustc_metadata.f82b8cc0-cgu.3
4377 after-mono/rustc-metadata/rustc_metadata.f82b8cc0-cgu.4
Generally it seems like CGU 3 is now much larger -- presumably around 1,000 items moved from being put into CGU 4 into CGU 3.
This included a move of fn <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::read_enum_variant_arg::<rustc_span::Span, for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> st...
from CGU 4 to 3. I believe (though have no hard evidence for it) that this likely caused transitive effects making the codegen worse. Notably other bits of that impl didn't move, so probably inlining is suffering here. (The below is a grep for rustc_span::Span as rustc_serialize::Decodable<.*>>::decode
in rustc_metadata CGUs).
--- empty-encode-span 2021-10-11 14:42:56.626182306 -0400 +++ full-encode-span 2021-10-11 14:42:36.389748828 -0400 @@ -2,9 +2,9 @@ rustc-metadata/rustc_metadata.f82b8cc0-cgu.15:fn <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::read_enum_variant_arg::<rustc_span::Span, for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}> Internal rustc-metadata/rustc_metadata.f82b8cc0-cgu.15:fn <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::read_struct_field::<rustc_span::Span, for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}> Internal rustc-metadata/rustc_metadata.f82b8cc0-cgu.3:fn <for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode} as std::ops::FnOnce<(&mut rmeta::decoder::DecodeContext,)>>::call_once - shim(for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}) Internal +rustc-metadata/rustc_metadata.f82b8cc0-cgu.3:fn <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::read_enum_variant_arg::<rustc_span::Span, for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}> Internal rustc-metadata/rustc_metadata.f82b8cc0-cgu.3:fn <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::read_struct_field::<rustc_span::Span, for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}> Internal rustc-metadata/rustc_metadata.f82b8cc0-cgu.4:fn <for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode} as std::ops::FnOnce<(&mut rmeta::decoder::DecodeContext,)>>::call_once - shim(for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}) Internal -rustc-metadata/rustc_metadata.f82b8cc0-cgu.4:fn <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::read_enum_variant_arg::<rustc_span::Span, for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}> Internal rustc-metadata/rustc_metadata.f82b8cc0-cgu.4:fn <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::read_struct_field::<rustc_span::Span, for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}> Internal rustc-metadata/rustc_metadata.f82b8cc0-cgu.7:fn <for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode} as std::ops::FnOnce<(&mut rmeta::decoder::DecodeContext,)>>::call_once - shim(for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}) Internal rustc-metadata/rustc_metadata.f82b8cc0-cgu.7:fn <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::read_struct_field::<rustc_span::Span, for<'r> fn(&'r mut rmeta::decoder::DecodeContext) -> std::result::Result<rustc_span::Span, <rmeta::decoder::DecodeContext as rustc_serialize::Decoder>::Error> {<rustc_span::Span as rustc_serialize::Decodablermeta::decoder::DecodeContext>::decode}> Internal
Overall I think this means we probably can't really do much about the hit here -- it's just more CGU partitioning sometimes leading to suboptimal results. It's possible that adding some targeted #[inline] annotations or otherwise shifting around code might help with this, but generally that seems brittle and not particularly warranted. I'm going to mark this regression as triaged, because I don't think we can "fix" it at this time. Hopefully we'll eventually have a nicer partitioning algorithm that avoids this kind of trouble.
In the meantime, it's possible we can help by avoiding so much transitive codegen being necessary in rustc_metadata for all the serialize impls for rustc_ast and such. But that's a hard project and not one with obvious solutions, so I think we shouldn't block triaging this PR on it.
@rustbot label +perf-regression-triaged