Improve documentation parsing in Clang (original) (raw)
February 8, 2025, 7:03am 1
Description of the project: Clang-Doc is a C/C++ documentation generation tool created as an alternative for Doxygen and built on top of LibTooling. This effort started in 2018 and critical mass has landed in 2019, but the development has been largely stagnant mostly due to a lack of resources until last year when the development restarted as a successful Google Summer of Code project.
The tool is built on top of LibTooling and leverages Clang parsers which supports parsing of Doxygen commands in documentation comments (this support is also used in the implementation of Clang’s -Wdocumentation which can be used to validate the content of documentation comments during compilation).
Unfortunately, Clang’s documentation parser is incomplete and has several issues:
- Not all Doxygen commands are supported, limiting the Clang-Doc’s usability.
- Not all C/C++ constructs are currently handled, most notably C++20 features such as concepts.
- Markdown support in documentation comments introduced in Doxygen version 1.8.0 is missing.
Expected result: The goal of this project is to implement the missing features in Clang’s documentation parser as well as their handling in Clang-Doc to improve the quality of the generated documentation. The eventual goal is for the LLVM project to start using Clang-Doc for generating its reference documentation, but before we can do that we need to ensure that all required features are implemented.
Successful proposals should focus not only on addressing the existing limitations, but also draw inspiration for other potential improvements from other documentation tools such as hdoc, standardese, subdoc or cppdocgen.
Over the course of the project, the candidate will have an opportunity to gain significant experience with LLVM and Clang internals (including lexer and parser) and C/C++ language.
Skills: Intermediate knowledge of C++; interest in compilers and parsers. Previous experience with Clang/LibTooling is a bonus but not required.
Project size: Either medium or large.
Difficulty: Medium
Confirmed Mentor: @petrhosek, @ilovepi
Hey @petrhosek
I’m interested in this project. Could you please suggest a good first issue or something related so that I can become more familiar with the idea and the codebase?
ilovepi March 11, 2025, 3:47pm 3
Hi @nikeokoronkwo,
GitHub · Where software is built has a list of open issues related to clang-doc.
[clang-doc] Comments in macros don't appear in the generated docs · Issue #59819 · llvm/llvm-project · GitHub may already be fixed, but I don’t think anyone has confirmed. Confirming whether or not its fix and adding a test should be fairly straight forward (we should have a regression test in either case).
[clang-doc] --repository links don't work with github · Issue #59814 · llvm/llvm-project · GitHub is another candidate that I think could work. After we’ve fixed up testing in [clang-doc] Make `--repository` change the HTML output by ilovepi · Pull Request #122566 · llvm/llvm-project · GitHub, the remaining work to make that configurable should be relatively small.
Most of the other issues are one’s where we haven’t investigated enough to have a good idea if the change is very complicated or not. For example
[clang-doc] static member variables are not included · Issue #59813 · llvm/llvm-project · GitHub is one I’d caution against, since its either a very small change or a very big one, but we haven’t investigated enough yet to know for sure.
evelez March 13, 2025, 2:46am 4
Hi I’m interested in this project! May I ask what the current state of Clang-Doc is? I don’t mean for this to be a bug report or tech support forum but building Clang-Doc from scratch results in an out of memory
error in HTMLGenerator.cpp:508
when trying to generate HTML for a single file. Doesn’t happen for just YAML and funnily enough all the tests pass. The prepackaged LLVM 19 Clang-Doc reports a missing default index.js on Fedora. I was trying to get a sense of what clang-doc does and doesn’t do and I’m not sure if it’s something on my end.
I think that getting support for all major C++ constructs would be top priority, followed by supporting Doxygen commands and markdown support for better expression. Would the goal be to support all Doxygen commands? Or just popular ones? I was thinking a good place to start would be to at least support the ones used in clang.
As far as potential improvements, has a search feature been explored at all? (Saw that this wasn’t a priority on last year’s proposal) Or support for non-documentation pages like tutorials, introductions, etc? (Which would consist of Markdown pages, most likely)
Edit: packaged Clang-Doc works on macOS, latest homebrew version. Clang-Doc built from source has same crash on macOS.
ilovepi March 17, 2025, 7:15pm 5
Clang-Doc should be functional, albeit with less than ideal output (though @PeterChou1 has a draft PR that should improve that significantly).
Right now, we don’t handle enough C++, and we’d like to improve the quality of documentation as outlined above. Improving Clang-Doc to support markdown is something we’d consider a core functionality, and I think an important first step to support those use cases. I doubt we need to support every doxygen command, but certainly an MVP would be to support all the one’s used in LLVM, and then prioritize them based on popularity.
Search, while a nice property, is less important than improving the basic functionality at the current time.
With respect to OOM, please file an issue, as that shouldn’t happen. Last year we significantly reduced the memory consumption and improved the performance with algorithmic improvements. If there are still issues there, we need to know, but we’re building documentation for Fuchsia using it, though it isn’t our default yet. It would be great if you could file something for the Mac issue you saw, as well.
evelez March 18, 2025, 3:23am 6
It looks like this issue might address the problems I’m having. If it doesn’t I will file another issue.
I almost have a proposal ready and would appreciate it if I could pass it along for review.
ilovepi March 18, 2025, 7:03pm 7
I almost have a proposal ready and would appreciate it if I could pass it along for review.
That’s great. You are highly encouraged to solicit feedback on your proposal, and both @petrhosek and I are looking forward to it.
Hi! My name is Ahmad Abdul Rehman and Im interested in the project. Ive looked into past development work and saw that it pivoted to address slow runtime. Given we want to enhance documentation parser for C++20 and markdown support, do you think the performance improvements from last year are sufficient to handle these new features or is runtime still a concern we’d need to consider.
Additionally, is there a resource available that shows the popularity of Doxygen commands?
Thank you
Hey, I am Zhong Yijiang, new for compiler and LLVM, but I am always interested in compiler and related tools when I know compile. I have C/C++ experiences, but for compiler, I just can do a simple parser or use FLEX to do that.
Recently, I’m exploring LLVM and have been able to compile some parts of LLVM on demand. And reading A Tour of C++ to learning C++20. So I will try to fix issues you list.
PS. I find clang-doc documentation is not rich, such as how to build clang-doc, it takes the learning curve is steep. Do you have plan to write it?
ilovepi March 20, 2025, 10:23pm 11
Thanks for your interest!
The runtime performance was significantly improved last year. I think we’re mostly fine w/ its runtime performance. I’m fairly confident they can still be improved (particularly around memory usage), but our benchmark project went from taking 6 hours to 13 minutes. It requires a beefy machine, but its certainly faster than most other documentation tools, like Doxygen.
This year, I think we want to focus on improving the core functionality, but we’re compiler people, so performance work is never off the table . But as of right now, I don’t think runtime performance is clang-doc’s biggest issue.
ilovepi March 20, 2025, 10:31pm 12
Clang-Doc builds the same way as other clang tools, like clangd
or clang-tidy
.
https://llvm.org/docs/GettingStarted.html
https://llvm.org/docs/CMake.html
Generally your cmake invocation will be something along the lines of
cmake -B llvm-build -S llvm-project/llvm -GNinja -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" ...
and then build with
ninja clang-doc
We’re always welcome to improvements to our documentation, so if you think that could be made easier to find I’d be happy to review a PR.
Thanks your reply. I am trying to write a regression test for [clang-doc] Comments in macros don’t appear in the generated docs · Issue #59819 · llvm/llvm-project. Now I add marco.cpp
to llvm-project/clang-tools-extra/test/clang-doc
and use llvm-lit macro.cpp
to test it.
I meet a trouble. When I run below code, it is Passed
:
// RUN: rm -rf %t && mkdir -p %t
// RUN: clang-doc --format=md --output=%t --executor=standalone %s
// RUN: FileCheck %s < %t/GlobalNamespace/MyClass.md --check-prefix=MD-MyClass
#define DECLARE_METHODS \
/// Comment \
int Add(int a, int b) { \
return a + b; \
}
// MD-MyClass: ## Functions
// MD-MyClass: ### Add
// MD-MyClass: *public int Add(int a, int b)*
class MyClass {
public:
DECLARE_METHODS
};
But after adding LINE index, it is Failed
:
// RUN: rm -rf %t && mkdir -p %t
// RUN: clang-doc --format=md --output=%t --executor=standalone %s
// RUN: FileCheck %s < %t/GlobalNamespace/MyClass.md --check-prefix=MD-MyClass-LINE
// RUN: FileCheck %s < %t/GlobalNamespace/MyClass.md --check-prefix=MD-MyClass
#define DECLARE_METHODS \
/// Comment \
int Add(int a, int b) { \
return a + b; \
}
// MD-MyClass: ## Functions
// MD-MyClass: ### Add
// MD-MyClass: *public int Add(int a, int b)*
class MyClass {
public:
// MD-MyClass-LINE: *Defined at {{.*}}clang-tools-extra{{[\/]}}test{{[\/]}}clang-doc{{[\/]}}macro.cpp#[[@LINE+1]]*
DECLARE_METHODS
};
Output is:
$ llvm-lit macro.cpp
llvm-lit: /home/zhonguncle/Desktop/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using clang: /home/zhonguncle/Desktop/llvm-project/build/bin/clang
-- Testing: 1 tests, 1 workers --
FAIL: Clang Tools :: clang-doc/macro.cpp (1 of 1)
********************
Failed Tests (1):
Clang Tools :: clang-doc/macro.cpp
Testing Time: 0.04s
Total Discovered Tests: 1
Failed: 1 (100.00%)
I spend 3 hours to try, read documentations and refer enum.cpp
, I still didn’t solve it. Waiting your reply.
It is my first time to use FileCheck and llvm-lit to test, so some of my questions and ways may be very stupid, I’m deeply sorry for the trouble caused.
Haha, what a coincidence, I just notice that it might be due to the comment format after replied (maybe I asked this question when I couldn’t solve it in an hour), so I searched for doxygen-style comments, and then modified some code, and now it is Passed
. Next I will finish the HTML part.
hey, I PR my patch, Add test to clang-doc, it can test comments in macro. Original issue is #59819. by ZhongUncle · Pull Request #132360 · llvm/llvm-project.
I am very interested in clang-doc. When I was writing this test, I found that there are some problems to parse content in some comment styles. For example, if use Doxygen /**
style in macro, because we use \
at end of line, some of \
will be in generated Markdown code. I think we can fix this issue in future.
ilovepi March 21, 2025, 3:29pm 16
@ZhongUncle while I’m very glad to see your enthusiasm for improving clang-doc, I’d like to keep this thread focused on the GSOC projects. I’m happy to discuss clang-doc issues on the GitHub issue tracker, and help with technical questions/issues on your PRs. Generally, we can provide better feedback on the PR than in a forum, and certainly it’s easier for me to understand specific technical problems if I can see the full context, the way you can in a code review. .
If you’re very interested in working on clang-doc I’d encourage you to draft a proposal.
xys-syx April 24, 2025, 6:53am 17
Hello, can we expect the feedbacks of the proposal which has been submitted?
ilovepi April 24, 2025, 3:03pm 18
The community is reviewing proposals right now. They should be done and released to GSOC fairly soon. I’m not quite sure when you’ll get the feedback after that, but IIRC it isn’t too long.