Performance/Multi-DC MediaWiki - Wikitech (original) (raw)

Multi-DC MediaWiki (also known as, Active-active MediaWiki) is a cross-cutting project driven by the Performance Team to give MediaWiki the ability to serve read requests from multiple datacenters.

Historically, Wikimedia Foundation served requests to MediaWiki from its primary datacenter only, where the secondary data center was effectively a cold stand-by exclusively for disaster recovery. By deploying and actively serving MediaWiki from two (or more) datacenters during normal operations, we achieve higher resilience in case of network loss, datacenter failure, removes most switchover costs and complexity, and eases regular maintenance.

Throughout the project, Multi-DC initiatives have brought about performance, availability, and resiliency improvements across the MediaWiki codebase and at every level of our infrastructure. These gains were effectively and advantageous even to the then-single DC operations. These gains were often due to restructuring business logic and implementing async and eventual-consistency solutions.

The project's final switch (T279664) improved page load speed by reducing latencies to geographies west of Texas, USA (Codfw data center), including for Asia and US West coast. This impacted all logged-in traffic as well as cache misses from logged-out traffic. The project brings a promise of future performance potential through serving more traffic from nearby DCs.

Initial work (2014-2016)

RFC

The project was formalised via the Multi-DC strategy RFC in 2015, led by Aaron Schulz (Performance Team).

JobQueue

Changes led by Aaron in collaboration with Ori (Performance Team), including:

See also History of job queue runners at WMF.

WANObjectCache

Designed and implemented by Aaron. Read more about this in:

MariaDB lag indication

The default mechanism in MySQL to let web applications measure database replication lag, is to fetch the Seconds_behind_master value using "SHOW STATUS" queries. This works well for small sites, and is what MediaWiki does by default as well. At our scale, however, this can be inaccurate or unreliable. For example, DBAs tend to prefer a replication topology where a chain is used rather than all replicas sourcing a single primary. This eases cross-dc switchovers and day to day intra-dc maintenance. A number of other issues and use cases are detailed at T111266.

To mitigate this we adopted the pt-heartbeat service. Deployment led by Jaime Crespo (SRE Data Persistence) with the MediaWiki client (within the Rdbms library) written by Aaron.

Media storage

See also Media storage#History. MediaWiki's FileBackend abstraction layers was developed in 2010-2012 by Aaron to facilitate WMF's transition from NFS to Swift, and was extended and excercised further during the migration from the Pmtpa data center to, our present-day primary DC, Eqiad.

In 2015, further work took place on SwiftFileBackend and FileBackendMultiwrite in order to not rely on periodic out-of-bound replication but rather have MediaWiki directly write to both data centers (T91869, T112708, T89184).

In 2016, the encryption for cross-dc traffic between the two Swift clusters was finalized by SRE, involving Nginx as TLS proxy. T127455

Devise strategy for near-realtime Elasticsearch replication so that search suggestions and search queries work independently. Led by Erik Bernhardson (Search Platform). T91870

Incremental progress

DC independence

Throughout the codebase, individual components had to be improved, refactored, or even rewritten entirely to be independent of a (slow and possibly overloaded) connection to the primary DC. This largely took place between 2014 and 2017, led by Aaron Schulz, tracked under T88445 and T92357. Including:

Mcrouter

The relay interface exposed by WANObjectCache was fulfilled in 2016 by Mcrouter, which replaced WMF's prior Twemproxy and Nutcracker infrastructure (T132317). Configuration and deployment led by Giuseppe Lavagetto (SRE Service Ops) in 2018 (T192370).

CDN purging

A long standing problem has been the reliability of CDN purges. The foundation's CDN by its very nature has always operated from multiple DCs and so this isn't a new problem for the Multi-DC initiative around backend services. Incremental improvements to Multicast HTCP purging took place mostly 2017-2018, led by Emanuele Rocca and Brandon Black (SRE Traffic). Details at T133821. The Purged service (based on Kafka) was developed and deployed in 2020 to solve the reliability issue long-term.

Thumbnail serving

Thumbor and storing thumbnails in Swft in both DCs. Carried out in 2018 with traffic routing implemented by Filippo (SRE Infrastructure Foundations), and application layer work by Gilles (Performance Team). T201858

Media originals serving

In 2019, we began to serve originals from upload.wikimedia.org from multiple data centers, implemented by Alexandros Kosiaris and Filippo (SRE). T204245

Remaining work (2020-2022)

Aaron Schulz has driven the effort of improving, upgrading, and porting the various production systems around MediaWiki to work in an active-active context with multiple datacenters serving MediaWiki web requests. You can see the history of subtasks on Phabricator.

This document focuses on remaining work left as of December 2020 – the major blockers left before enabling the active-active serving of MediaWiki.

ChronologyProtector

ChronologyProtector is the system ensuring that editors see the result of their own actions in subsequent interactions.

The remaining work is deciding where and how to store the data going forward, to deploy any infra and software changes as needed, and to enable these.

Updates:

Session storage

The session store holds temporary data required for authenticating and authorization procedures such as logging in, creating accounts, and security checks before actions such as editing pages.

The older data storage system has various short-comings beyond mere incompatibility with a multi-DC operation. Even in our current single-DC deployment the annual switchovers are cumbersome, and a replacement has been underway for some time.

The remaining work is to finish the the data storage migration from Redis (non-replicated) to Kask (Cassandra-based).

Updates:

CentralAuth storage

A special kind of session storage for the central login system and cross-wiki "auto login" and "stay logged in" mechanism.

The last part of that work, migrating CentralAuth sessions, is currently scheduled for completion in Oct-Dec 2020 (2020-2021 Q2).

Updates:

Main Stash store

The Redis cluster previously used for session storage is also host to other miscellaneous application data through the Main Stash interface. This has different needs than session storage which become more prominent in a multi-DC deployment which make it unsuitable for Cassandra/Kask.

The remaining work is to survey the consumers and needs of Main Stash, decide how to accomodate them going forward. E.g. would it help if we migrated some of its consumers elsewhere and have a simpler replacement for the rest? Also: carry out any software and infra changes as needed.

Updates:

MariaDB cross-datacenter secure writes

MediaWiki being active-active means that writes still only go to the primary datacenter, however a fallback is required for edge cases where a write is attempted in a secondary datacenter. In order to preserve our users' privacy, writes need to be sent encrypted across datacenters. Multiple solutions are being considered, but a decision has yet to be made on which one will be implemented. This work will be a collaboration between the SRE Data Persistence Team and the Performance Team. We hope for it to happen during fiscal year 2020-2021.

Updates:

ResourceLoader file dependency store

Currently written to a core wiki table using a primary DB connection, must be structured such that writes are done within a secondary DC and then replicated. The plain is to migrate it toward the Main Stash instead.

Updates:

CDN routing

Remaining work is to agree on the MW requirements, and then write, test and deploy the traffic routing configuration.

Updates:

Further reading

History