(original) (raw)
On Sat, Aug 28, 2021 at 8:52 AM Aaron Ballman via cfe-dev <cfe-dev@lists.llvm.org> wrote:
On Sat, Aug 28, 2021 at 3:48 AM David Blaikie <dblaikie@gmail.com> wrote:
\>
\> +1 to what Manuel's said here.
\>
\> One slight change I'd suggest is changing the term "breaking changes" to "non-whitespace changes", perhaps? (they aren't necessarily breaking anything) At least I assume that's the intent, but I might be wrong in which case I'd love to better understand what's being proposed.
To me, the crux of my concern isn't nonwhitespace changes, but changes
that can make code which used to compile no longer do so. It just so
happens that nonwhitespace changes are where that risk is highest
Perhaps it would be correct to say that the problematic formatters are those that change the file's sequence of preprocessing tokens. This is particularly relevant to clang-format because clang-format doesn't actually parse C++. So for example you might imagine a formatter that cuddles angle brackets:
std::vector > v; // BEFORE
std::vector> v; // AFTER
This changes the token sequence, so it's potentially dangerous. Because clang-format doesn't parse, such a formatter can't tell the difference between that (safe, post-C++03) edit and this (unsafe) edit:
X<&Y::operator> >(); // BEFORE
X<&Y::operator>>(); // AFTER: syntax error
Obviously such a formatter is still going to be relatively safe in practice. But because it (has the potential to) change the token sequence, it is qualitatively more dangerous than a formatter that merely reformats the existing token sequence.
Shuffling around the tokens (e.g. changing west-const into east-const) is just a special case of changing the token sequence.
In particular, if you change the token sequence when you're inside a preprocessor macro, then (because clang-format doesn't parse C++) you really have no idea what effect your change is going to have.
#define X(V) int V = 42
int main() { X(v1); X(const v2); }
Here, editing \`const v2\` into \`v2 const\` produces a syntax error.
Now, for any formatter, one can find pathological programs that are broken by it; e.g.
template void F() requires (X==2) {}
int main() { F<\_\_LINE\_\_>(); }
will stop compiling if you add linebreaks to it. I don't think this quite reduces my thesis to absurdity, but I admit it's theoretically awkward. But if you restrict your edits to those that preserve the token sequence, then I think you'll only ever break programs that use either \`#X\` (stringifying) or \`\_\_LINE\_\_\`. Anyone care to produce a counterexample? :)
my $.02,
Arthur