Convert EMPTY_LINE_AFTER_OUTER_ATTR and EMPTY_LINE_AFTER_OUTER_ATTR lint into early lints by GuillaumeGomez · Pull Request #13658 · rust-lang/rust-clippy (original) (raw)

@y21 there's a complicated trade-off here. You probably know that, but I guess I'll just document it anyway for future readers.


On the one hand, we want to simplify the way attributes are represented, just like we want to simplify how all elements of a programming language are represented throughout a compiler. for loops become while loops, async becomes state machines, conditionals become jumps and basic blocks, etc. All complexity is reduced until the complexity matches that of what hardware can execute. As the complexity reduces, it becomes simpler to extract information from it in a way in which we can guarantee, to some level, is correct, by being able to reason about it. At the level of the AST, we can't yet reason about types and flows of lifetimes, so we first reduce the complexity. We do what we can or have to at the AST level (name resolution, expansion) until we can safely discard some details and reduce the complexity. This isn't some magical insight of mine or something, I just like this way of thinking about it.

Attributes are interesting, because their complexity is (was) never reduced. On all levels of the compiler, we reason about them as if they're at the syntactical level. While, after expansion, they aren't really used as syntax anymore. They're used as a small set of annotations to guide processes such as code generation. At this point, we don't really care about how the user wrote these annotations, we just care about the list of annotations themselves. This item has C representation, should be used, is stable since 1.85, etc. Those properties are true whether you annotated the item from the inside with #![repr(C)] or from the outside with #[repr(C)]. Similarly, we don't really care anymore whether you said #[repr(C)] #[repr(packed)] struct Foo; or #[repr(C, packed)] struct Foo; because they mean the same. So, there is complexity we can reduce here to make later passes easier. I think this is in most cases a desirable change.

From that point of view, lints that tell users that the syntax of their attributes is wrong, is supposed to happen early. In a stage of the compiler where we still care about syntaxes of things. While lints that care about the structure of attributes,
let's say a hypothetical lint that says repr(C) and repr(Rust) are incompatible, is fine to go at the hir level of abstraction


On the other hand, and that's the downside we see here of course, we sometimes need information spanning these abstraction levels. I wanted to say that this is especially true for clippy. For example, we sometimes need information about the syntax of an attribute, and the type of an item.

However, the compiler has this situation very often too. if the compiler itself needed this kind of spanning information, it'd make some kind of table. Name resolution is an example of this. At the AST level we already analyse code for names, and store that in a bunch of maps. Then, later parts of the compiler can query what name refers to what definition. Type checking is a similar operation, and so are spans to later be used in diagnostics.

Clippy here has the problem that it might need information in a table like this, to span abstraction levels, that the compiler does not. So far, where this happened it lead to awkwardness. Here it's needing parent nodes (information that is only analysed after HIR creation) together with the purely syntactical representation of attributes. Another place you see this in is around format_args!() leading to what we now call "early -> late storage hacks".

I'm not necessarily advocating for those hacks, but in some ways, this is kind of how rustc works, and how compilers work all the time. It's just that clippy can usually benefit from the storage and analyses of the compiler, and only rarely needs to do these kinds of hacks for just itself. The few times that it does we call it a hack because it's so exceptional.


It seems that representing attributes at the AST-complexity level throughout the entire compiler – while lowering the complexity of all other constructs (to HIR, and then MIR, and further) – isn't maintainable. It's unfortunate that this, at least at the moment, gives clippy slightly less information from the compiler side. Still, I'm somewhat convinced that this is the right direction anyway. Clippy lints dealing with the syntactical representation of attributes should be early, since that's where we deal with syntactical constructs. But, I'm not maintaining clippy, so that might be easy for me to say.

I guess this long comment is the reasoning I have behind these changes, and how I like looking at it from, admittedly, a little idealized point of view of what compilers should look like. Maybe that reasoning is good to have documented somewhere, so I guess for now that's here. However, neither rustc nor clippy are even close to ideal, I think we can all mostly agree on that. So, I realise that if the overhead is exceedingly large everywhere except in the compiler, it's well possible I could be wrong. I'm happy to hear either alternative opinions or maybe ways to go forward with this and keep the situation better in clippy as well.

<3