[llvm-dev] [cfe-dev] [RFC] Adding support for clang-format making further code modifying changes (original) (raw)

Arthur O'Dwyer via llvm-dev llvm-dev at lists.llvm.org
Sat Aug 28 06:51:52 PDT 2021


On Sat, Aug 28, 2021 at 8:52 AM Aaron Ballman via cfe-dev < cfe-dev at lists.llvm.org> wrote:

On Sat, Aug 28, 2021 at 3:48 AM David Blaikie <dblaikie at gmail.com> wrote: > > +1 to what Manuel's said here. > > One slight change I'd suggest is changing the term "breaking changes" to "non-whitespace changes", perhaps? (they aren't necessarily breaking anything) At least I assume that's the intent, but I might be wrong in which case I'd love to better understand what's being proposed.

To me, the crux of my concern isn't nonwhitespace changes, but changes that can make code which used to compile no longer do so. It just so happens that nonwhitespace changes are where that risk is highest

Perhaps it would be correct to say that the problematic formatters are those that change the file's sequence of preprocessing tokens. This is particularly relevant to clang-format because clang-format doesn't actually parse C++. So for example you might imagine a formatter that cuddles angle brackets: std::vector<std::vector > v; // BEFORE std::vector<std::vector> v; // AFTER This changes the token sequence, so it's potentially dangerous. Because clang-format doesn't parse, such a formatter can't tell the difference between that (safe, post-C++03) edit and this (unsafe) edit: X<&Y::operator> >(); // BEFORE X<&Y::operator>>(); // AFTER: syntax error

Obviously such a formatter is still going to be relatively safe in practice. But because it (has the potential to) change the token sequence, it is qualitatively more dangerous than a formatter that merely reformats the existing token sequence.

Shuffling around the tokens (e.g. changing west-const into east-const) is just a special case of changing the token sequence.

In particular, if you change the token sequence when you're inside a preprocessor macro, then (because clang-format doesn't parse C++) you really have no idea what effect your change is going to have. #define X(V) int V = 42 int main() { X(v1); X(const v2); } Here, editing const v2 into v2 const produces a syntax error.

Now, for any formatter, one can find pathological programs that are broken by it; e.g. template void F() requires (X==2) {} int main() { F<__LINE__>(); } will stop compiling if you add linebreaks to it. I don't think this quite reduces my thesis to absurdity, but I admit it's theoretically awkward. But if you restrict your edits to those that preserve the token sequence, then I think you'll only ever break programs that use either #X (stringifying) or __LINE__. Anyone care to produce a counterexample? :)

my $.02, Arthur -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210828/46727aaf/attachment.html>



More information about the llvm-dev mailing list