I noticed that https://github.com/python/cpython/runs/2378199636 (a coverage job on the last commit on master at the time of writing) takes suspiciously long to complete. I did some investigation and noticed that this job on the 3.9 branch succeeds (all of the job runs on the first page in the list are green — https://github.com/python/cpython/actions/workflows/coverage.yml?query=branch%3A3.9) But then I took a look at the runs on master and discovered that the last successful run was 4 months ago — https://github.com/python/cpython/actions.html?query=is%3Asuccess+branch%3Amaster&workflow_file_name=coverage.yml. The last success is https://github.com/python/cpython/actions/runs/444323166 and after that, starting with https://github.com/python/cpython/actions/runs/444405699, if fails consistently. Notably, all of the failures are caused by the job timeout after *6 hours* — GitHub platform just kills those, 6h is a default per-job timeout in GHA. It's also important to mention that before every job starting timing out effectively burning 6 hours of GHA time for each merge and producing no useful reports, there were occasional 6h-timeouts but they weren't consistent. Looking into the successful runs from the past, on master and other jobs, I haven't noticed it taking more than 1h35m to complete with a successful outcome. Taking into account this as a baseline, I suggest changing the timeout of the whole job or maybe just one step that actually runs coverage. Action items: * Set job timeout in GHA to 1h40m (allowing a bit of extra time for exceptionally slow jobs) — this will make sure that the failure/timeout is reported sooner than 6h * Figure out why this started happening in the first place. I'm going to send a PR addressing the first point but feel free to pick up the investigation part — I don't expect to have time for this anytime soon. P.S. FTR the last timeout of this type happened two months ago — https://github.com/python/cpython/actions.html?page=4&query=branch%3A3.9&workflow_file_name=coverage.yml.
Coverage runs are still failing on the master, and I think at least we should do something like allow failure or other wise github will send notifications for this flaky run.
For what it's worth I think Brett's suggestion of just removing the coverage build entirely is good too since it seems like no one actually looks at the results and they take up valuable CI time.