T299823 Regularly run recountCategories.php on Wikimedia wikis via systemd timer (original) (raw)

Regularly run recountCategories.php on Wikimedia wikis via systemd timer

For reasons, category counts tend to get out of sync, leading to categories having inaccurate counts, even sometimes negative counts (impossible). @Taylor has documented past examples of this at T85696#7105227.

One suggestion (which I pushed a patch for) is to recount categories on action=purge: T85696: Allow action=purge to recalculate the number of pages/subcats/files in a category, but that requires human intervention and someone to recognize the count is wrong. We have a pretty efficient script, recountCategories.php, that should be run regularly to reconcile any remaining differences. We had to run it everywhere for T299244 and the slowest wiki was commonswiki at ~45 minutes.

I propose we run recountCategories.php once a month as a systemd timer.

Related Changes in Gerrit:

Event Timeline

Peachey88 renamed this task from Run recountCategories.php regularly on Wikimedia wikis to Regularly run recountCategories.php regularly on Wikimedia wikis via systemd timer.Jan 22 2022, 7:44 AM

taavi renamed this task from Regularly run recountCategories.php regularly on Wikimedia wikis via systemd timer to Regularly run recountCategories.php on Wikimedia wikis via systemd timer.Jan 22 2022, 7:51 AM

Comment Actions

How do we want to run this? For all wikis? By shards?

Taking as example what @Legoktm did for T299244, I made a draft patch at https://gerrit.wikimedia.org/r/c/operations/puppet/+/756069

If you think we can go directly with foreachwiki, please let me know. It'll make the file substantially shorter too.

I think we can do this with a plain foreachwiki; in T299244 there was some urgency to get the counts fixed so I split it by shard but here if it's a bit slower that's fine. Each wiki will still get counts refreshed once a month anyways.

Comment Actions

I think we can do this with a plain foreachwiki; in T299244 there was some urgency to get the counts fixed so I split it by shard but here if it's a bit slower that's fine. Each wiki will still get counts refreshed once a month anyways.

Thank you. Patch amended to use foreachwiki.

I see that the script suggests to run rMW maintenance/cleanupEmptyCategories.php in --mode remove if the recounting category script is run in the pages mode (we're configuring it to use all which includes that mode). Shall we configure a periodic job for cleanupEmptyCategories.php too?

Comment Actions

@Majavah Could you please run the improved recountCategories script on the Beta Cluster and save the output for review in a Paste?

foreachwiki maintenance/recountCategories --mode all >path/to/logs

@Legoktm and myself would be interested in knowing how it does before merging the Puppet patch.

Thanks!

Comment Actions

@Majavah Could you please run the improved recountCategories script on the Beta Cluster and save the output for review in a Paste?

P19566

Comment Actions

As requested on IRC, here's the log from today's run:

mediawiki_job_recount_categories.txt2 MBDownload

Comment Actions

Re: Tech News - let me know if any changes are needed, ASAP. I'll be freezing it for translations within ~2 hours.

Comment Actions

Re: Tech News - let me know if any changes are needed, ASAP. I'll be freezing it for translations within ~2 hours.

I think what we have now is fine. On IRC MA and I discussed that while it is possible things are temporarily wrong because of this script (which is still undetermined), overall things are less wrong.

Comment Actions

Shall we ensure: absent the script while the investigation on the reported issues is ongoing?

Comment Actions

I had looked at this back in February or in March and didn't think there was anything actionable left. If people are having issues with category counts being significantly wrong at the beginning of the month (after this script runs), please shout.

Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL · Credits