Project Governance roundtable at Euro-LLVM 2024 (original) (raw)

@rnk hosted a roundtable at Euro-LLVM to discuss the project governance proposal.

Here is an attempt to summarize the main points. The raw notes are further below.


(This is essentially a transcript of the discussion. Any names were removed. “I” below is different people.)

Intro

(I had a dream that someone wanted to bring back the autoconf build to LLVM, and people were arguing endlessly about it. Take-way: arguing about stuff is normal, and we should set up systems to resolve people’s disagreements effectively. We should be more prepared for the idea that people are not going to agree. We should plan for that and argue constructively in ways that minimize burnout.)

Welcome to the LLVM Governance Roundtable! There’s a proposal, using LLVM’s pitch process for major policy changes, Reid, Mehdi, Chris B, Eric C, and some other folks are co-authors. At this found-table I want to go over the proposal, get feedback, and write up notes. We’ll incorporate the notes into edits to the proposal and continue going through the pitch process. The way the process works, review managers and the authors of the proposal will join a conference call at some point ahead of the next developer meeting and have an update by then. Chris Lattner will be on the call and decide if there’s consensus.

The proposal is essentially to change that process, to add some teams that essentially replace Chris’s role in that process. Those teams should also be used for escalating RFCs and coming to decisions about them on the area team level.

Let’s talk about the proposal itself to make sure we’re on the same page. There are basically three parts:

  1. Define a body of voting contributors in LLVM and have an election mechanism to vote on who we want to represent the project. The proposal says to use any repository activity in the past 12 months to make someone eligible to vote. That’s a starting point. If you have ideas for how to improve on it, comments are welcome.
  2. The proposal identifies 13 project areas, where each project area will have an area team consisting of 5 people on one-year terms, re-elected every January. This works out to 65 positions. There’s no limitation on being on multiple area teams. Being on an area team obligates you to join a monthly call and make decisions, and provide high-level feedback on currently ongoing RFCs relevant to the area.
  3. Above the area teams, there’s the project council. Each area team elects a chair and a secretary, and the project council consists of the chairs from each project area. They also meet monthly.

What does the project council do?

They’re basically a point of escalation, deciding project policy matters that go beyond any one area. For example, the current conversions about commit access, decisions about admitting incubator projects into the monorepo. We don’t really have effective processes for these decisions; that’s the vision of why we should have such a structure.

What are the project areas?

The proposal is on GitHub, the current one is LP-0004: https://github.com/llvm/llvm-www/pull/54 You can render the markdown here: llvm-www/proposals/LP0004-project-governance.md at 9d7d7e3435e58e7747a10500f7b43b2bcc157700 · llvm/llvm-www · GitHub

The areas are probably also subject to change. 13 areas sounds like a lot. There’s also a process for adding and removing them.

We’ve got LLVM Backend, LLVM Middle-End (everything that’s not the backend), Clang, Clang Tooling, C/C++ Runtime Libraries, Compiler Runtimes, Flang, MLIR, LLDB, Binary Tools, Incubator Projects, Project Infra, Community.

Burnout

In light of the xz exploit, it’s interesting that that was in some ways a product of a lack of healthy community. There was an individual maintainer without a lot of support, so he turned to someone else (who turned out to be a bad actor).

To keep the project going, we need long-time contributors to not get burned out from pushing endless RFCs up a hill against lots of friction. Doing that is damaging to the long-term health of our community.

I think that having more effective, time limited, decision processes where we get feedback on a monthly time-scale would do a lot to reduce contributor burnout and improve community health, which is probably our best defence against social engineering attacks.

What happens when an Area Team doesn’t agree?

I believe it’s escalated to the Project Council.

What happens when the Project Council doesn’t agree?

I think then nothing happens. Chris Lattner? Don’t know about that.

It’s not in the proposal though. The important part of a governance proposal is to think through “what’s the back-stop”. “How do you design this so that it definitely results in a decision?”

I’m not sure it’s in the proposal, but I think the Project Council has the option to turn a question back to the voting contributors. This structure is similar to the Python Steering Committee – though I don’t think they have area teams, just the top-level committee – and in their experience the most meaningful thing was defining the voting contributor body. All of this representation is ultimately to make sure that voting contributors have a stake and are represented. So if a proposal is truly contentious, the Project Council can kick it back to a project-wide vote. But that feels like it’s for truly tough decisions, like “should we migrate to github”.

Who closes the pull request [on anything that’s rejected]?

I think for the project area level, the area team would do that. And if people don’t like the decision, they can always escalate to the project council.

But if you’re working on, say ClangIR, and the Clang area team gets together and say they don’t want this, what are your options? Do your own thing (work around that constraint, fork) or escalate.

To answer your question, I think the Clang team would be entitled to push Close on the PR.

The proposal does not explain how the Project Council makes decisions

The doc does not actually explain how the Project Council makes decisions, or if they have any kind of fallback option. The only discussion about voting in the doc is for elections. If the proposal is that the Project Council uses project-wide voting as a fallback mechanism, it would be really good to make it clear.

Yes, I think that’s good feedback.

Is the Project Council one of the five people on the Area Team or separate people?

The Project Council is composed of the chairs of the Area Teams.

Each Area Team has five people. Amongst themselves, they select a chair and secretary. The chair is the representative to the project council.

The Council should have an odd number of people

Yes, that was a feature of the design [having 13 areas]. Though it is a bit weird as we add and remove areas, that we’d always want an odd number of areas.

Finding 65 people to do this work seems hard

People have brought this up.

One view that’s often brought up is that we should use our existing code owners structure to handle this kind of stuff instead of adding this new/additional process.

Code ownership is an expensive, not well defined set of responsibilities without a process for removing it. My hope is that since serving on an area team is a time limited effort, a commitment to attend a meeting once a month and read the material beforehand for one year, even with limited volunteer resources we’ll be able to come up with this many people.

The other alternative would be for some areas to have smaller area teams.

The other thought is that maybe there’ll be some overlap in team representation, which could start eating into people’s time.

The proposal is great, heads and shoulders above what we currently have / nothing. But, if we use engagement with the proposal as a barometer for engagement with governance, I’m concerned that we’ll be able to find 65 interested parties, and 13 of those willing to form the project council. This is additional responsibility above and beyond owners, and we’re already in an owners crunch. Have you given any thought to what the fallback might be?

People have given the feedback that this is a lot of roles. Chris already did pare down the proposal, reducing the number of area teams. We could go further in that direction, and delegate decision making for some areas up to the project council, as a way to reduce the number of roles.

I imagine there are going to be parts of LLVM that need more governance than others? Like MLIR is a hot topic which may see strong interest in governance, a lot of volunteers, and escalations, but say LLD might need less.

One way to make first steps, get something on the ground, would be to start with electing a project council, which will be hard enough, and once you have that going you could see for example “there’s lots of stuff coming from MLIR, we need a specialised body for that”. I think that starting with so many volunteers as currently proposed, I fear that this will never get launched.

It’s true that the initial definition has a lot of roles and bureaucracy for a v1.

Do you need five? Going to three would mean 26 fewer slots to fill.

Yeah, those are the knobs you could turn while sticking with the current proposed structure.

Was there any reason to select 5, was this based on research or something?

No, not that I’m aware. Just that it should be an odd number to avoid ties.

Have you looked historically at the areas where RFCs typically occur? Like over the last year, which areas have RFCs, like what’s the histogram?

I haven’t done any data analysis, but what I see is mostly project policy stuff: should we mandate pull requests, premerge checks, etc. Those are issue which would ultimately devolve to the top level.

What if we started with the Project Council, and then that could delegate to or spin up Area Teams? The project council could structure teams based on demand, need. Like they could decide they need a Clang area team.

I know Chris felt it was important for the Project Council to have representation from across the project. Rolling up the area team leads accomplishes that. It’s a bit parliamentary.

I like that though, it’s good to have diverse representation in the council.

But if we started with the Project Council, it wouldn’t necessarily have that property.

But if you elected one person from each area?

Yeah, that would be a modification of the proposal. Maybe what we’re electing is area team leads, and then they form the project council, and then if the area team lead feels they need support they can [form a wider area team?]. I like it, that’s a good suggestion.

That would bring the number of people down to 13.

As was mentioned earlier, there’s pretty disparate needs between project areas. LLD and Clang Tooling probably don’t have the same needs as Clang itself, they’re not the same projects.

Yeah, although I think it would be helpful for some of these areas where we have a single code owner, I think it would be helpful to actually have a meeting where they work with people.

Agreed, that would be a good improvement. But if the area isn’t under active development, this may be a little bit less interesting.

Codegen is a diverse component → Who should be on the Area Teams?

I’m a little concerned about the Codegen area [LLVM Backend?]. It’s probably not that dynamic and might not need that much governance, but it’s a very diverse component with a lot of different stakeholders for different platforms. Do you think there’s something special we can do about it to take all the platforms into account?

Yeah, like is the codegen rep going to be an ARM or Intel employee?

Year, or NVIDIA for that specific platform. But the decisions made at that level probably affects all of codegen.

That’s an interesting concern.

Well there’s nothing in the proposal prohibiting people in one area from reaching out to other members of the community right?

And I think one goal of the area teams is to cycle junior community members into this role. A successful Area Team is not necessarily the most senior people on the project, it’s people coming together and reading and being aware of all the relevant issues, and producing decision which they feel reflects consensus. But also making tough decisions, so I think having a team with a mix of junior people — I think this is a pathway for new people to ramp onto the project.

I guess that’s one reason to push back a little on the idea of reducing the Area Team to one person and nominate the obvious most senior code owner to that area. That could be a failure mode.

There are different roles though, and I think you have to pick which one you’re trying to do. If you’re trying to handle escalations and be on the project council, you need people with good judgement who can understand the tradeoffs really deeply. So maybe that’s not a great venue for junior folks to start contributing.

So maybe you need different roles here. If you have a whole area team, maybe that can fit those different roles. But I don’t know if the project governance needs to solve for all the problems here all at once.

How long does it take to reach a decision?

This is something that has come up in other context and which there is great pressure from the community on: wanting a definitive timeline. How long does it take to arrive at a decision? What do I actually get unblocked?

I think it would be really good to, at the same time we get really pointy about exactly how decisions are made, it would also be great to establish expectations around the timeline.

If you have Area Teams that have a monthly meeting, I think you could say that you could expect to hear back from the Area Team after that. They may not have an answer for you, it’s still a volunteer open source projects and you can’t force community members to review your proposal, but I do feel this helps to set expectations.

Goals of the governance proposal

I think it would be good for the proposal to say more about the goals. There seems to be a lot of implicit goals that you’re designing around that are not well articulated, such as stability and rooting in a voting process.

On a lot of projects, having to wait a month for a decision on how to unblock something that’s stuck would be really bad. But if there’s a specific goal for the kind of decisions to be handled by the process, maybe it would be more reasonable.

There seems to be an unstated goal underpinning a lot of the structures here. It seems like a model for slow, careful decision making in response to extremely broad feedback. It doesn’t seem like a model for rapid decision making. That might be okay, but there’s some implicit goal and context leading you to that kind of design. It would be great to turn that into something explicit instead of implicit.

But a month would be a dramatic improvement over the status quo.

Is the goal to improve over status quo though, or this there some other goal? I’m not saying any of these are right or wrong, but I think saying what the goal is really helps.

What is that status quo? Three months? A year? What would be too slow?

That status quo is unbounded. Three months to four years. It’s that nobody knows or has any expectation of when they’re going to get feedback.

I think the goal is to set a ceiling on feedback of essentially one month. Feedback doesn’t mean yes or no, it could mean “not enough information”. But you have some expectation that you get some feedback within a month. I think that should be an explicit goal.

If we actually had a list of the things that got stuck over the last one or two years — for some RFCs it would be a simple answer, like obviously one month, but for some RFCs, like the migration to GitHub, if that had been resolved in a year that would have been a great outcome. And maybe faster than a year would have been too quick. There are probably a few categories like that. Having the examples would make this discussion more concrete: what would we like to happen in these examples?

At the community.o event, people cited the GitHub migration as an example of effective process, with Mehdi(?) providing periodic updates on what were the open questions, what was discussed and decided. For such year-long issues, I imagine these meetings (area team? project council?) would provide updates like that.

The enforcement of having a monthly meeting enforces that some progress must be made every month, which is a great improvement over the status quo.

I think if there really was examples, or a list of stats – like “in the past year we had 20 RFCs go through this” — it helps more concretely discuss the governance proposal. Maybe someone should volunteer to do that work. Maybe one of the coauthors of the governance proposal.

Tom was updating the commit access RFCs. He tried to conclude it, and Chris provided feedback that people would not see the conclusion, that it was not visible enough. If we had an Area Team post an update that they held a meeting and what was their end state, that would add more visibility and clarity that a decision had been made.

Is there a clear boundary of what needs to be discussed by this governance? Like an RFC or a PR or an issue that comes up. What size of a change or proposal should go through this? If someone was motivated to make a change quickly, they might be motivated to go around this process which is obviously harder than getting code in.

We do this already where people send patches and others push back saying this should be an RFC, so I imagine that’s how this would work: a code reviewer, potentially in post-submit, might question if there’s agreement on something, and that’s how it would make it onto the agenda. But that could probably be clearer.

Something that’s worked really well is making everything escalatable. If you have a good decision making process, you don’t have to pick and choose what can or how it gets escalated. A lot of people are worried that will lead to too many escalations, but people don’t do that. They don’t want to, it’s stressful to escalate, there’s plenty of backpressure already in place. And this [having everything escalatable] can reassure people that there isn’t a loophole — you can always say something should get escalated.

Right, code reviews can get escalated to RFCs, area team decisions escalated to project council decisions, project council decisions escalated to community wide decisions, and that’s probably where the buck stops.

Having an escalation path really makes people happy. Even when they don’t follow the escalation path, they know that if they really cared that’s something they could do, but they also know the cost, so now they calibrate whether they really care.

The next step of escalation is forking the whole project.

You can always do that, but it has costs. It’s liberally licensed.

(first attempt at wrapping up the roundtable)

I would love to see the Project Council happen, over everything else. I think if that’s the only outcome, that’s amazing. The other things are great too, but I think that’s the most important.

One thing to keep in mind: I would think critically about the bad scenario. So you have a Project Council with 13 people. 7 of them want to go Path A, 6 want Path B. The decision matters, it’s an important decision. What do you do? If you go to a vote, and get 50.01% vs. 49.99%, is that a good basis to pick a direction? Are you going to be happy with this when it matters, for important decisions with big consequences? When people vote about it, one vote matters, and it’s me voting because I fixed a typo and have no reason to be voting?

I’m not trying to say you shouldn’t do this, I’m saying you should game out what it looks like and get really comfortable with the outcome.

Most of our decisions are maybe not that consequential and hard to reverse?

I think the GitHub migration matters and is hard to reverse.

Enabling code review on Phabricator is another example. It really divided the community for a long time. I’m not saying what’s right or wrong, I’m just raising this because it’s hard: how do you make sure that you’re comfortable with the outcome at the end of the day if you have a voting model. I’ve seen lots of things work, but mostly I think it’s about playing it out, thinking about the consequences.

One of the scarier ones for me: two people on the council saying no. Eleven people saying yes. But when you vote, 51% say no. Is that the right outcome? For the project?

I think one of the harder problems is even just deciding who gets to vote. That one seems impossible to solve to me. That seems like a huge issue before you get started anyway.

Someone on the Python steering committee said that was the biggest sticking point for the Python governance proposal. They spent most of their time defining the voting contributors. A feel our proposal is a bit of a strawman, because LLVM is very different from Python. Python has a much smaller set of core contributors, but we’ve got people doing accelerators, GPUs, CPU stuff, Fortran. It’s a big diverse community. So we have this more complicated multi-level area team project council structure.

It totally changes the dynamics of voting if voting is the fallback decision making for a decision, vs. if voting is just about electing people. This changes the stakes of voting pretty significantly. If you’re electing a project council the stakes might be pretty low, since you’re going to end up with pretty reasonable people in the project council — it’s good to go through the voting process but you don’t have to stress too much. But if “Github or not Github” goes to a vote, that vote matters.

Did we require supermajority in those cases? Because in those cases, if there’s a significant number of holdouts you don’t really want to proceed.

That’s what I’m saying, you want to think about the difference between voting on these things. Game out some of these hard decisions and think about how you want them to play out, and make sure that at each level you’re happy with the outcome.

There’s a section here about community values, and one of them is consensus. This is articulated in the RFC process which seeks to find rough consensus. I don’t know that we have processes to uphold this value, but I think that’s what we’re going for.

Maybe that makes things much easier for voting, I don’t know. Rough consensus doesn’t feel like 51% of the votes. If it’s a 7/6 project council decision, I feel like the project council should not do it because that’s not consensus.

I feel like the current situation is that when there is not agreement on an RFC, they get stuck forever, but if it goes to the project council and 13 people can’t decide, they get to try again in a month. 13 people is a much smaller set.

I don’t know if that’s gamed out in the proposal, but those are the values. Consensus seeking decision making.

Have you looked at Debian? They’re one of the other big projects that have voting. They have a long track record. Especially given the duration they’ve been doing a voting based governance model, they might have lessons that other groups haven’t learned yet. They have also dealt with deep and bitter controversy. You’ve got to be prepared for this. If the Python model hasn’t gone through a crisis… Systemd went through the Debian process and they made a decision. Because when something really controversial comes up, it’s too late to fix glitches in the governance model.