Use static captain to recover from loss of majority (original) (raw)

Splunk® Enterprise

Distributed Search

  1. Documentation
  2. Splunk® Enterprise
  3. Distributed Search
  4. Use static captain to recover from loss of majority

A cluster normally uses a dynamic captain, which can change over time. The dynamic captain is chosen by periodic elections, in which a majority of all cluster members must agree on the captain. See "Captain election."

If a cluster loses the majority of its members, therefore, it cannot elect a captain and cannot continue to function. You can work around this situation by reconfiguring the cluster to use a static captain in place of the dynamic captain.

A static captain does not change over time. Unlike a dynamic captain, the cluster does not conduct an election to select the static captain. Instead, you designate a member as the static captain, and that member remains the captain until you designate another member as captain.

Shortcomings of the static captain

The static captain has one fundamental shortcoming: It becomes a single point of failure for the cluster. If the captain fails, the cluster fails. The cluster cannot, on its own, replace a static captain. Rather, manual intervention is necessary.

Because of this shortcoming, Splunk recommends that you use the static captain capability only for disaster recovery. Specifically, you can employ the static captain to recover from a loss of majority, which renders the cluster incapable of electing a dynamic captain.

In addition, the static captain does not check whether enough members are running to meet the replication factor. This means that, under some conditions, you might not have a full complement of search artifact copies.

Note: You should only employ static captain when absolutely necessary. While the process of converting to static captain is usually simple and fast, the process of later reverting back to a dynamic captain is somewhat more involved.

Use cases for static captain

Here are some situations where it makes sense to switch to a static captain:

In all cases, once the precipitating issue has been resolved, you should revert the cluster to use a dynamic captain.

Caution: Do not use the static captain to handle a network interruption that stops communication between two sites. During a network interruption, the site with a majority of members continues to function as usual, because it can elect a dynamic captain as necessary. However, the site with a minority of members cannot elect a captain and therefore will not function as a cluster. If you attempt to revive the minority site by configuring its members to use a static captain, you will then have two clusters, one with a dynamic captain and the other with a static captain. When the network heals, you will not be able to reconcile the configuration changes between the sites.

Switch to a static captain

To switch to a static captain, reconfigure each cluster member to use a static captain:

1. On the member that you want to designate as captain, run this CLI command:

splunk edit shcluster-config -mode captain -captain_uri : -election false

2. On each non-captain member, run this CLI command:

splunk edit shcluster-config -mode member -captain_uri : -election false

Note the following:

You do not need to restart the captain or any other members after running these commands. The captain immediately takes control of the cluster.

To confirm that the cluster is now operating with a static captain, run this CLI command from any member:

splunk show shcluster-status -auth :

The election flag will be set to 0.

Revert to the dynamic captain

When the precipitating situation has resolved, you should revert the cluster to control by a single, dynamic captain. To switch to dynamic captain, you reconfigure all the members that you previously configured for static captain. How exactly you do this depends on the type of scenario you are recovering from.

This topic provides reversion procedures for the two main scenarios:

Return single-site cluster to dynamic captain

In the scenario of a single-site cluster with loss of majority, you should revert to dynamic mode once the cluster regains its majority:

1. As members come back online, convert them one-by-one to point to the static captain:

splunk edit shcluster-config -election false -mode member -captain_uri :

Note the following:

You do not need to restart the member after running this command.

As you point each rejoining member to the static captain, it attempts to download the replication delta. If the purge limit has been exceeded, the system will prompt you to perform a manual resync, as explained in "How the update proceeds."

Caution: During the time that it takes for the remaining steps of this procedure to complete, your users should not make any configuration changes.

2. Once the cluster has regained its majority, convert all members back to dynamic captain use. Convert the current, static captain last. To accomplish this, run this command on each member:

splunk edit shcluster-config -election true -mgmt_uri :

Note the following:

You do not need to restart the member after running this command.

3. Bootstrap one of the members. This member then becomes the first dynamic captain. It is recommended that you bootstrap the member that was previously serving as the static captain.

splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

For information on these parameters, see "Bring up the cluster captain."

Return two-site cluster to dynamic captain

In the scenario of a two-site cluster with loss of the majority site, you should revert to dynamic mode once the majority site comes back online:

1. When the majority site comes back online, convert its members to use the static captain. Point each majority site member to the static captain:

splunk edit shcluster-config -election false -mode member -captain_uri :

Note the following:

You do not need to restart the member after running this command.

As you point each rejoining member to the static captain, it attempts to download the replication delta. If the purge limit has been exceeded, the system will prompt you to perform a manual resync, as explained in "How the update proceeds."

2. Wait for all the majority-site members to get the replicated configs from the static captain. This typically takes a few minutes.

Caution: During the time that it takes for the remaining steps of this procedure to complete, your users should not make any configuration changes.

3. Convert all members back to dynamic captain use. Convert the current, static captain last. To accomplish this, run this command on each member:

splunk edit shcluster-config -election true -mgmt_uri :

Note the following:

You do not need to restart the member after running this command.

4. Bootstrap one of the members. This member then becomes the first dynamic captain. It is recommended that you bootstrap the member that was previously serving as the static captain.

splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

For information on these parameters, see "Bring up the cluster captain."

| | Handle failure of a search head cluster member | | Put a search head cluster member into detention | | | --------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- | | ------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.1.8, 9.1.9, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.2.5, 9.2.6, 9.3.0, 9.3.1, 9.3.2, 9.3.3, 9.3.4, 9.4.0, 9.4.1, 9.4.2