21.4.3.6 Defining NDB Cluster Data Nodes (original) (raw)

21.4.3.6 Defining NDB Cluster Data Nodes

The [ndbd] and [ndbd default] sections are used to configure the behavior of the cluster's data nodes.

[ndbd] and [ndbd default] are always used as the section names whether you are usingndbd or ndbmtd binaries for the data node processes.

There are many parameters which control buffer sizes, pool sizes, timeouts, and so forth. The only mandatory parameter isHostName; this must be defined in the local[ndbd] section.

The parameterNoOfReplicas should be defined in the [ndbd default] section, as it is common to all Cluster data nodes. It is not strictly necessary to setNoOfReplicas, but it is good practice to set it explicitly.

Most data node parameters are set in the [ndbd default] section. Only those parameters explicitly stated as being able to set local values are permitted to be changed in the [ndbd] section. Where present,HostName and NodeId must be defined in the local[ndbd] section, and not in any other section of config.ini. In other words, settings for these parameters are specific to one data node.

For those parameters affecting memory usage or buffer sizes, it is possible to use K, M, or G as a suffix to indicate units of 1024, 1024×1024, or 1024×1024×1024. (For example,100K means 100 × 1024 = 102400.)

Parameter names and values are case-insensitive, unless used in a MySQL Server my.cnf ormy.ini file, in which case they are case-sensitive.

Information about configuration parameters specific to NDB Cluster Disk Data tables can be found later in this section (seeDisk Data Configuration Parameters).

All of these parameters also apply to ndbmtd (the multithreaded version of ndbd). Three additional data node configuration parameters—MaxNoOfExecutionThreads,ThreadConfig, andNoOfFragmentLogParts—apply to ndbmtd only; these have no effect when used with ndbd. For more information, seeMulti-Threading Configuration Parameters (ndbmtd). See also Section 21.5.3, “ndbmtd — The NDB Cluster Data Node Daemon (Multi-Threaded)”.

Identifying data nodes. The NodeId or Id value (that is, the data node identifier) can be allocated on the command line when the node is started or in the configuration file.

Data Memory, Index Memory, and String Memory

DataMemory andIndexMemory are[ndbd] parameters specifying the size of memory segments used to store the actual records and their indexes. In setting values for these, it is important to understand howDataMemory andIndexMemory are used, as they usually need to be updated to reflect actual usage by the cluster.

Note

IndexMemory is deprecated in NDB 7.6, and subject to removal in a future version of NDB Cluster. See the descriptions that follow for further information.

  size  = ( (fragments * 32K) + (rows * 18) )  
          * fragment_replicas  

fragments is the number of fragments, fragmentreplicas is the number of fragment replicas (normally two), and_rows_ is the number of rows. If a table has one million rows, eight fragments, and two fragment replicas, the expected index memory usage is calculated as shown here:

  ((8 * 32K) + (1000000 * 18)) * 2 = ((8 * 32768) + (1000000 * 18)) * 2  
  = (262144 + 18000000) * 2  
  = 18262144 * 2 = 36524288 bytes = ~35MB  

Index statistics for ordered indexes (when these are enabled) are stored in themysql.ndb_index_stat_sample table. Since this table has a hash index, this adds to index memory usage. An upper bound to the number of rows for a given ordered index can be calculated as follows:

  sample_size= key_size + ((key_attributes + 1) * 4)  
  sample_rows = IndexStatSaveSize  
                * ((0.01 * IndexStatSaveScale * log2(rows * sample_size)) + 1)  
                / sample_size  

In the preceding formula,keysize is the size of the ordered index key in bytes,keyattributes is the number ot attributes in the ordered index key, and_rows_ is the number of rows in the base table.
Assume that table t1 has 1 million rows and an ordered index named ix1 on two four-byte integers. Assume in addition thatIndexStatSaveSize andIndexStatSaveScale are set to their default values (32K and 100, respectively). Using the previous 2 formulas, we can calculate as follows:

  sample_size = 8  + ((1 + 2) * 4) = 20 bytes  
  sample_rows = 32K  
                * ((0.01 * 100 * log2(1000000*20)) + 1)  
                / 20  
                = 32768 * ( (1 * ~16.811) +1) / 20  
                = 32768 * ~17.811 / 20  
                = ~29182 rows  

The expected index memory usage is thus 2 * 18 * 29182 = ~1050550 bytes.
Prior to NDB 7.6, the default value forIndexMemory is 18MB and the minimum is 1 MB; in NDB 7.6, the minimum and default vaue for this parameter is 0 (zero). This has implications for downgrades from NDB 7.6 to earlier versions of NDB Cluster; seeSection 21.3.7, “Upgrading and Downgrading NDB Cluster”, for more information.

The following example illustrates how memory is used for a table. Consider this table definition:

CREATE TABLE example (
  a INT NOT NULL,
  b INT NOT NULL,
  c INT NOT NULL,
  PRIMARY KEY(a),
  UNIQUE(b)
) ENGINE=NDBCLUSTER;

For each record, there are 12 bytes of data plus 12 bytes overhead. Having no nullable columns saves 4 bytes of overhead. In addition, we have two ordered indexes on columnsa and b consuming roughly 10 bytes each per record. There is a primary key hash index on the base table using roughly 29 bytes per record. The unique constraint is implemented by a separate table withb as primary key and a as a column. This other table consumes an additional 29 bytes of index memory per record in the example table as well 8 bytes of record data plus 12 bytes of overhead.

Thus, for one million records, we need 58MB for index memory to handle the hash indexes for the primary key and the unique constraint. We also need 64MB for the records of the base table and the unique index table, plus the two ordered index tables.

You can see that hash indexes takes up a fair amount of memory space; however, they provide very fast access to the data in return. They are also used in NDB Cluster to handle uniqueness constraints.

Currently, the only partitioning algorithm is hashing and ordered indexes are local to each node. Thus, ordered indexes cannot be used to handle uniqueness constraints in the general case.

An important point for bothIndexMemory andDataMemory is that the total database size is the sum of all data memory and all index memory for each node group. Each node group is used to store replicated information, so if there are four nodes with two fragment replicas, there are two node groups. Thus, the total data memory available is 2 ×DataMemory for each data node.

It is highly recommended thatDataMemory andIndexMemory be set to the same values for all nodes. Data distribution is even over all nodes in the cluster, so the maximum amount of space available for any node can be no greater than that of the smallest node in the cluster.

DataMemory (and in NDB 7.5 and earlierIndexMemory) can be changed, but decreasing it can be risky; doing so can easily lead to a node or even an entire NDB Cluster that is unable to restart due to there being insufficient memory space. Increases should be acceptable, but it is recommended that such upgrades are performed in the same manner as a software upgrade, beginning with an update of the configuration file, and then restarting the management server followed by restarting each data node in turn.

MinFreePct. A proportion (5% by default) of data node resources includingDataMemory (and in NDB 7.5 and earlier,IndexMemory) is kept in reserve to insure that the data node does not exhaust its memory when performing a restart. This can be adjusted using the MinFreePct data node configuration parameter (default 5).

Version (or later) NDB 7.5.0
Type or units unsigned
Default 5
Range 0 - 100
Restart Type Node Restart: Requires a rolling restart of the cluster. (NDB 7.5.0)

Updates do not increase the amount of index memory used. Inserts take effect immediately; however, rows are not actually deleted until the transaction is committed.

Transaction parameters. The next few [ndbd] parameters that we discuss are important because they affect the number of parallel transactions and the sizes of transactions that can be handled by the system.MaxNoOfConcurrentTransactions sets the number of parallel transactions possible in a node.MaxNoOfConcurrentOperations sets the number of records that can be in update phase or locked simultaneously.

Both of these parameters (especiallyMaxNoOfConcurrentOperations) are likely targets for users setting specific values and not using the default value. The default value is set for systems using small transactions, to ensure that these do not use excessive memory.

MaxDMLOperationsPerTransaction sets the maximum number of DML operations that can be performed in a given transaction.

TotalNoOfConcurrentTransactions =  
    (maximum number of tables accessed in any single transaction + 1)  
    * number of SQL nodes  

Suppose that there are 10 SQL nodes using the cluster. A single join involving 10 tables requires 11 transaction records; if there are 10 such joins in a transaction, then 10 * 11 = 110 transaction records are required for this transaction, per MySQL server, or 110 * 10 = 1100 transaction records total. Each data node can be expected to handle TotalNoOfConcurrentTransactions / number of data nodes. For an NDB Cluster having 4 data nodes, this would mean settingMaxNoOfConcurrentTransactions on each data node to 1100 / 4 = 275. In addition, you should provide for failure recovery by ensuring that a single node group can accommodate all concurrent transactions; in other words, that each data node's MaxNoOfConcurrentTransactions is sufficient to cover a number of transactions equal to TotalNoOfConcurrentTransactions / number of node groups. If this cluster has a single node group, thenMaxNoOfConcurrentTransactions should be set to 1100 (the same as the total number of concurrent transactions for the entire cluster).
In addition, each transaction involves at least one operation; for this reason, the value set forMaxNoOfConcurrentTransactions should always be no more than the value ofMaxNoOfConcurrentOperations.
This parameter must be set to the same value for all cluster data nodes. This is due to the fact that, when a data node fails, the oldest surviving node re-creates the transaction state of all transactions that were ongoing in the failed node.
It is possible to change this value using a rolling restart, but the amount of traffic on the cluster must be such that no more transactions occur than the lower of the old and new levels while this is taking place.
The default value is 4096.

Transaction temporary storage. The next set of [ndbd] parameters is used to determine temporary storage when executing a statement that is part of a Cluster transaction. All records are released when the statement is completed and the cluster is waiting for the commit or rollback.

The default values for these parameters are adequate for most situations. However, users with a need to support transactions involving large numbers of rows or operations may need to increase these values to enable better parallelism in the system, whereas users whose applications require relatively small transactions can decrease the values to save memory.

Scans and buffering. There are additional [ndbd] parameters in the Dblqh module (inndb/src/kernel/blocks/Dblqh/Dblqh.hpp) that affect reads and updates. These includeZATTRINBUF_FILESIZE, set by default to 10000 × 128 bytes (1250KB) andZDATABUF_FILE_SIZE, set by default to 10000*16 bytes (roughly 156KB) of buffer space. To date, there have been neither any reports from users nor any results from our own extensive tests suggesting that either of these compile-time limits should be increased.

4 * MaxNoOfConcurrentScans * [# data nodes] + 2  

The minimum value is 32.

Memory Allocation

MaxAllocate

Version (or later) NDB 7.5.0
Type or units unsigned
Default 32M
Range 1M - 1G
Deprecated Yes (in NDB 8.0)
Restart Type Node Restart: Requires a rolling restart of the cluster. (NDB 7.5.0)

This parameter was used in older versions of NDB Cluster, but has no effect in NDB 7.5 or NDB 7.6.

Hash Map Size

DefaultHashMapSize

Version (or later) NDB 7.5.0
Type or units LDM threads
Default 240
Range 0 - 3840
Restart Type Node Restart: Requires a rolling restart of the cluster. (NDB 7.5.0)

The size of the table hash maps used byNDB is configurable using this parameter. DefaultHashMapSize can take any of three possible values (0, 240, 3840).

The original intended use for this parameter was to facilitate upgrades and especially downgrades to and from very old releases with differing default hash map sizes. This is not an issue when upgrading from NDB Cluster 7.3 (or later) to later versions.

Decreasing this parameter online after any tables have been created or modified with DefaultHashMapSize equal to 3840 is not supported.

Logging and checkpointing. The following [ndbd] parameters control log and checkpoint behavior.

$> gcc lcp_simulator.cc  
$> ./a.out  

This program has no dependencies other thanstdio.h, and does not require a connection to an NDB cluster or a MySQL server. By default, it simulates 300 LCPs (three sets of 100 LCPs, each consisting of inserts, updates, and deletes, in turn), reporting the size of the LCP after each one. You can alter the simulation by changing the values ofrecovery_work,insert_work, anddelete_work in the source and recompiling. For more information, see the source of the program.

Metadata objects. The next set of [ndbd] parameters defines pool sizes for metadata objects, used to define the maximum number of attributes, tables, indexes, and trigger objects used by indexes, events, and replication between clusters.

Note

These act merely as “suggestions” to the cluster, and any that are not specified revert to the default values shown.

Boolean parameters. The behavior of data nodes is also affected by a set of[ndbd] parameters taking on boolean values. These parameters can each be specified asTRUE by setting them equal to1 or Y, and asFALSE by setting them equal to0 or N.

Controlling Timeouts, Intervals, and Disk Paging

There are a number of [ndbd] parameters specifying timeouts and intervals between various actions in Cluster data nodes. Most of the timeout values are specified in milliseconds. Any exceptions to this are mentioned where applicable.

The heartbeat interval between management nodes and data nodes is always 100 milliseconds, and is not configurable.

Buffering and logging. Several [ndbd] configuration parameters enable the advanced user to have more control over the resources used by node processes and to adjust various buffer sizes at need.

These buffers are used as front ends to the file system when writing log records to disk. If the node is running in diskless mode, these parameters can be set to their minimum values without penalty due to the fact that disk writes are“faked” by the NDB storage engine's file system abstraction layer.

Controlling log messages. In managing the cluster, it is very important to be able to control the number of log messages sent for various event types to stdout. For each event category, there are 16 possible event levels (numbered 0 through 15). Setting event reporting for a given event category to level 15 means all event reports in that category are sent tostdout; setting it to 0 means that there are no event reports made in that category.

By default, only the startup message is sent tostdout, with the remaining event reporting level defaults being set to 0. The reason for this is that these messages are also sent to the management server's cluster log.

An analogous set of levels can be set for the management client to determine which event levels to record in the cluster log.

2006-12-24 01🔞16 [MgmSrvr] INFO -- Node 2: Data usage is 50%(1280 32K pages of total 2560)  

MemReportFrequency is not a required parameter. If used, it can be set for all cluster data nodes in the [ndbd default] section of config.ini, and can also be set or overridden for individual data nodes in the corresponding [ndbd] sections of the configuration file. The minimum value—which is also the default value—is 0, in which case memory reports are logged only when memory usage reaches certain percentages (80%, 90%, and 100%), as mentioned in the discussion of statistics events inSection 21.6.3.2, “NDB Cluster Log Events”.

2009-06-20 16:39:23 [MgmSrvr] INFO -- Node 1: Local redo log file initialization status:  
#Total files: 80, Completed: 60  
#Total MBytes: 20480, Completed: 15557  
2009-06-20 16:39:23 [MgmSrvr] INFO -- Node 2: Local redo log file initialization status:  
#Total files: 80, Completed: 60  
#Total MBytes: 20480, Completed: 15570  

These reports are logged eachStartupStatusReportFrequency seconds during Start Phase 4. IfStartupStatusReportFrequency is 0 (the default), then reports are written to the cluster log only when at the beginning and at the completion of the redo log file initialization process.

Data Node Debugging Parameters

The following parameters are intended for use during testing or debugging of data nodes, and not for use in production.

Backup parameters. The [ndbd] parameters discussed in this section define memory buffers set aside for execution of online backups.

Note

The location of the backup files is determined by theBackupDataDir data node configuration parameter.

Additional requirements. When specifying these parameters, the following relationships must hold true. Otherwise, the data node cannot start.

NDB Cluster Realtime Performance Parameters

The [ndbd] parameters discussed in this section are used in scheduling and locking of threads to specific CPUs on multiprocessor data node hosts.

Note

To make use of these parameters, the data node process must be run as system root.

Multi-Threading Configuration Parameters (ndbmtd). ndbmtd runs by default as a single-threaded process and must be configured to use multiple threads, using either of two methods, both of which require setting configuration parameters in theconfig.ini file. The first method is simply to set an appropriate value for theMaxNoOfExecutionThreads configuration parameter. A second method makes it possible to set up more complex rules for ndbmtd multithreading usingThreadConfig. The next few paragraphs provide information about these parameters and their use with multithreaded data nodes.

Prior to NDB 7.6, if the cluster'sIndexMemory usage is greater than 50%, changing this requires an initial restart of the cluster. (A maximum of 30-35%IndexMemory usage is recommended in such cases.) Otherwise, resource usage and LDM thread allocation cannot be balanced between nodes, which can result in underutilized and overutilized LDM threads, and ultimately data node failures. In NDB 7.6 and later, an initial restart is not required to effect a change in this parameter.

ThreadConfig := entry[,entry[,...]]  
entry := type={param[,param[,...]]}  
type := ldm | main | recv | send | rep | io | tc | watchdog | idxbld  
param := count=number  
  | cpubind=cpu_list  
  | cpuset=cpu_list  
  | spintime=number  
  | realtime={0|1}  
  | nosend={0|1}  
  | thread_prio={0..10}  
  | cpubind_exclusive=cpu_list  
  | cpuset_exclusive=cpu_list  

The curly braces ({...}) surrounding the list of parameters are required, even if there is only one parameter in the list.
A param (parameter) specifies any or all of the following information:

Range: 0 - 1.  
This thread type was added in NDB 7.6\. (Bug #25835748, Bug #26928111)  

Prior to NDB 7.6, changing ThreadCOnfig requires a system initial restart. In NDB 7.6 (and later), this requirement can be relaxed under certain circumstances:

In any other case, a system initial restart is needed to change this parameter.
NDB 7.6 can distinguish between thread types by both of the following criteria:

# Example 1.  
ThreadConfig=ldm={count=2,cpubind=1,2},main={cpubind=12},rep={cpubind=11}  
# Example 2.  
Threadconfig=main={cpubind=0},ldm={count=4,cpubind=1,2,5,6},io={cpubind=3}  

It is usually desirable when configuring thread usage for a data node host to reserve one or more number of CPUs for operating system and other tasks. Thus, for a host machine with 24 CPUs, you might want to use 20 CPU threads (leaving 4 for other uses), with 8 LDM threads, 4 TC threads (half the number of LDM threads), 3 send threads, 3 receive threads, and 1 thread each for schema management, asynchronous replication, and I/O operations. (This is almost the same distribution of threads used whenMaxNoOfExecutionThreads is set equal to 20.) The followingThreadConfig setting performs these assignments, additionally binding all of these threads to specific CPUs:

ThreadConfig=ldm{count=8,cpubind=1,2,3,4,5,6,7,8},main={cpubind=9},io={cpubind=9}, \  
rep={cpubind=10},tc{count=4,cpubind=11,12,13,14},recv={count=3,cpubind=15,16,17}, \  
send{count=3,cpubind=18,19,20}  

It should be possible in most cases to bind the main (schema management) thread and the I/O thread to the same CPU, as we have done in the example just shown.
The following example incorporates groups of CPUs defined using both cpuset andcpubind, as well as use of thread prioritization.

ThreadConfig=ldm={count=4,cpuset=0-3,thread_prio=8,spintime=200}, \  
ldm={count=4,cpubind=4-7,thread_prio=8,spintime=200}, \  
tc={count=4,cpuset=8-9,thread_prio=6},send={count=2,thread_prio=10,cpubind=10-11}, \  
main={count=1,cpubind=10},rep={count=1,cpubind=11}  

In this case we create two LDM groups; the first usescpubind and the second usescpuset. thread_prio and spintime are set to the same values for each group. This means there are eight LDM threads in total. (You should ensure thatNoOfFragmentLogParts is also set to 8.) The four TC threads use only two CPUs; it is possible when using cpuset to specify fewer CPUs than threads in the group. (This is not true forcpubind.) The send threads use two threads using cpubind to bind these threads to CPUs 10 and 11. The main and rep threads can reuse these CPUs.
This example shows how ThreadConfig andNoOfFragmentLogParts might be set up for a 24-CPU host with hyperthreading, leaving CPUs 10, 11, 22, and 23 available for operating system functions and interrupts:

NoOfFragmentLogParts=10  
ThreadConfig=ldm={count=10,cpubind=0-4,12-16,thread_prio=9,spintime=200}, \  
tc={count=4,cpuset=6-7,18-19,thread_prio=8},send={count=1,cpuset=8}, \  
recv={count=1,cpuset=20},main={count=1,cpuset=9,21},rep={count=1,cpuset=9,21}, \  
io={count=1,cpuset=9,21,thread_prio=8},watchdog={count=1,cpuset=9,21,thread_prio=9}  

The next few examples include settings foridxbld. The first two of these demonstrate how a CPU set defined foridxbld can overlap those specified for other (permanent) thread types, the first usingcpuset and the second usingcpubind:

ThreadConfig=main,ldm={count=4,cpuset=1-4},tc={count=4,cpuset=5,6,7}, \  
io={cpubind=8},idxbld={cpuset=1-8}  
ThreadConfig=main,ldm={count=1,cpubind=1},idxbld={count=1,cpubind=1}  

The next example specifies a CPU for the I/O thread, but not for the index build threads:

ThreadConfig=main,ldm={count=4,cpuset=1-4},tc={count=4,cpuset=5,6,7}, \  
io={cpubind=8}  

Since the ThreadConfig setting just shown locks threads to eight cores numbered 1 through 8, it is equivalent to the setting shown here:

ThreadConfig=main,ldm={count=4,cpuset=1-4},tc={count=4,cpuset=5,6,7}, \  
io={cpubind=8},idxbld={cpuset=1,2,3,4,5,6,7,8}  

In order to take advantage of the enhanced stability that the use of ThreadConfig offers, it is necessary to insure that CPUs are isolated, and that they not subject to interrupts, or to being scheduled for other tasks by the operating system. On many Linux systems, you can do this by settingIRQBALANCE_BANNED_CPUS in/etc/sysconfig/irqbalance to0xFFFFF0, and by using theisolcpus boot option ingrub.conf. For specific information, see your operating system or platform documentation.

Disk Data Configuration Parameters. Configuration parameters affecting Disk Data behavior include the following:

Disk Data and GCP Stop errors. Errors encountered when using Disk Data tables such asNode nodeid killed this node because GCP stop was detected (error 2303) are often referred to as “GCP stop errors”. Such errors occur when the redo log is not flushed to disk quickly enough; this is usually due to slow disks and insufficient disk throughput.

You can help prevent these errors from occurring by using faster disks, and by placing Disk Data files on a separate disk from the data node file system. Reducing the value ofTimeBetweenGlobalCheckpoints tends to decrease the amount of data to be written for each global checkpoint, and so may provide some protection against redo log buffer overflows when trying to write a global checkpoint; however, reducing this value also permits less time in which to write the GCP, so this must be done with caution.

In addition to the considerations given forDiskPageBufferMemory as explained previously, it is also very important that theDiskIOThreadPool configuration parameter be set correctly; havingDiskIOThreadPool set too high is very likely to cause GCP stop errors (Bug #37227).

GCP stops can be caused by save or commit timeouts; theTimeBetweenEpochsTimeout data node configuration parameter determines the timeout for commits. However, it is possible to disable both types of timeouts by setting this parameter to 0.

Parameters for configuring send buffer memory allocation. Send buffer memory is allocated dynamically from a memory pool shared between all transporters, which means that the size of the send buffer can be adjusted as necessary. (Previously, the NDB kernel used a fixed-size send buffer for every node in the cluster, which was allocated when the node started and could not be changed while the node was running.) TheTotalSendBufferMemory and OverLoadLimit data node configuration parameters permit the setting of limits on this memory allocation. For more information about the use of these parameters (as well asSendBufferMemory), seeSection 21.4.3.13, “Configuring NDB Cluster Send Buffer Parameters”.

See also Section 21.6.7, “Adding NDB Cluster Data Nodes Online”.

Redo log over-commit handling. It is possible to control a data node's handling of operations when too much time is taken flushing redo logs to disk. This occurs when a given redo log flush takes longer thanRedoOverCommitLimit seconds, more thanRedoOverCommitCounter times, causing any pending transactions to be aborted. When this happens, the API node that sent the transaction can handle the operations that should have been committed either by queuing the operations and re-trying them, or by aborting them, as determined byDefaultOperationRedoProblemAction. The data node configuration parameters for setting the timeout and number of times it may be exceeded before the API node takes this action are described in the following list:

Controlling restart attempts. It is possible to exercise finely-grained control over restart attempts by data nodes when they fail to start using theMaxStartFailRetries andStartFailRetryDelay data node configuration parameters.

MaxStartFailRetries limits the total number of retries made before giving up on starting the data node,StartFailRetryDelay sets the number of seconds between retry attempts. These parameters are listed here:

NDB index statistics parameters. The parameters in the following list relate to NDB index statistics generation.

Restart types. Information about the restart types used by the parameter descriptions in this section is shown in the following table:

Table 21.15 NDB Cluster restart types

Symbol Restart Type Description
N Node The parameter can be updated using a rolling restart (seeSection 21.6.5, “Performing a Rolling Restart of an NDB Cluster”)
S System All cluster nodes must be shut down completely, then restarted, to effect a change in this parameter
I Initial Data nodes must be restarted using the--initial option