Troubleshooting for Amazon Aurora - Amazon Aurora (original) (raw)

Use the following sections to help troubleshoot problems you have with DB instances in Amazon RDS and Amazon Aurora.

Topics

Can't connect to Amazon RDS DB instance
Amazon RDS security issues
Resetting the DB instance owner password
Amazon RDS DB instance outage or reboot
Amazon RDS DB parameter changes not taking effect
Freeable memory issues in Amazon Aurora
Amazon Aurora MySQL replication issues

For information about debugging problems using the Amazon RDS API, see Troubleshooting applications on Aurora.

Can't connect to Amazon RDS DB instance

When you can't connect to a DB instance, the following are common causes:

Inbound rules – The access rules enforced by your local firewall and the IP addresses authorized to access your DB instance might not match. The problem is most likely the inbound rules in your security group.
By default, DB instances don't allow access. Access is granted through a security group associated with the VPC that allows traffic into and out of the DB instance. If necessary, add inbound and outbound rules for your particular situation to the security group. You can specify an IP address, a range of IP addresses, or another VPC security group.

Note

When adding a new inbound rule, you can choose My IP for Source to allow access to the DB instance from the IP address detected in your browser.
For more information about setting up security groups, seeProvide access to the DB cluster in the VPC by creating a security group.

Note

Client connections from IP addresses within the range 169.254.0.0/16 aren't permitted. This is the Automatic Private IP Addressing Range (APIPA), which is used for local-link addressing.

Public accessibility – To connect to your DB instance from outside of the VPC, such as by using a client application, the instance must have a public IP address assigned to it.
To make the instance publicly accessible, modify it and chooseYes under Public accessibility. For more information, see Hiding a DB cluster in a VPC from the internet.
Port – The port that you specified when you created the DB instance can't be used to send or receive communications because of your local firewall restrictions. To determine if your network allows the specified port to be used for inbound and outbound communication, check with your network administrator.
Availability – For a newly created DB instance, the DB instance has a status of creating until the DB instance is ready to use. When the state changes to available, you can connect to the DB instance. Depending on the size of your DB instance, it can take up to 20 minutes before an instance is available.
Internet gateway – For a DB instance to be publicly accessible, the subnets in its DB subnet group must have an internet gateway.

To configure an internet gateway for a subnet

Sign in to the AWS Management Console and open the Amazon RDS console athttps://console.aws.amazon.com/rds/.
In the navigation pane, choose Databases, and then choose the name of the DB instance.
In the Connectivity & security tab, write down the values of the VPC ID under VPC and the subnet ID under Subnets.
Open the Amazon VPC console athttps://console.aws.amazon.com/vpc/.
In the navigation pane, choose Internet Gateways. Verify that there is an internet gateway attached to your VPC. Otherwise, choose Create Internet Gateway to create an internet gateway. Select the internet gateway, and then chooseAttach to VPC and follow the directions to attach it to your VPC.
In the navigation pane, choose Subnets, and then select your subnet.
On the Route Table tab, verify that there is a route with 0.0.0.0/0 as the destination and the internet gateway for your VPC as the target.
If you're connecting to your instance using its IPv6 address, verify that there is a route for all IPv6 traffic (::/0) that points to the internet gateway. Otherwise, do the following:
1. Choose the ID of the route table (rtb-xxxxxxxx) to navigate to the route table.
2. On the Routes tab, choose Edit routes. Choose Add route, use 0.0.0.0/0 as the destination and the internet gateway as the target.
  For IPv6, choose Add route, use::/0 as the destination and the internet gateway as the target.
3. Choose Save routes.
  Also, if you are trying to connect to IPv6 endpoint, make sure that client IPv6 address range is authorized to connect to the DB instance.
  For more information, see Working with a DB cluster in a VPC.

Testing a connection to a DB instance

You can test your connection to a DB instance using common Linux or Microsoft Windows tools.

From a Linux or Unix terminal, you can test the connection by entering the following. Replace `DB-instance-endpoint` with the endpoint and `port` with the port of your DB instance.

nc -zv DB-instance-endpoint port

For example, the following shows a sample command and the return value.

nc -zv postgresql1.c6c8mn7fake0.us-west-2.rds.amazonaws.com 8299

  Connection to postgresql1.c6c8mn7fake0.us-west-2.rds.amazonaws.com 8299 port [tcp/vvr-data] succeeded!

Windows users can use Telnet to test the connection to a DB instance. Telnet actions aren't supported other than for testing the connection. If a connection is successful, the action returns no message. If a connection isn't successful, you receive an error message such as the following.

C:\>telnet sg-postgresql1.c6c8mntfake0.us-west-2.rds.amazonaws.com 819

  Connecting To sg-postgresql1.c6c8mntfake0.us-west-2.rds.amazonaws.com...Could not open
  connection to the host, on port 819: Connect failed

If Telnet actions return success, your security group is properly configured.

Note

Amazon RDS doesn't accept internet control message protocol (ICMP) traffic, including ping.

In some cases, you can connect to your DB instance but you get authentication errors. In these cases, you might want to reset the master user password for the DB instance. You can do this by modifying the RDS instance.

Amazon RDS security issues

To avoid security issues, never use your AWS account root user email address and password for a user account. Best practice is to use your root user to create users and assign those to DB user accounts. You can also use your root user to create other user accounts, if necessary.

For information about creating users, see Creating an IAM user in your AWS account. For information about creating users in AWS IAM Identity Center, see Manage identities in IAM Identity Center.

Error message "failed to retrieve account attributes, certain console functions may be impaired."

You can get this error for several reasons. It might be because your account is missing permissions, or your account hasn't been properly set up. If your account is new, you might not have waited for the account to be ready. If this is an existing account, you might lack permissions in your access policies to perform certain actions such as creating a DB instance. To fix the issue, your administrator needs to provide the necessary roles to your account. For more information, seethe IAM documentation.

Resetting the DB instance owner password

If you get locked out of your DB cluster, you can log in as the master user. Then you can reset the credentials for other administrative users or roles. If you can't log in as the master user, the AWS account owner can reset the master user password. For details of which administrative accounts or roles you might need to reset, see Master user account privileges.

You can change the DB instance password by using the Amazon RDS console, the AWS CLI command modify-db-instance, or by using the ModifyDBInstance API operation. For more information about modifying a DB instance in a DB cluster, see Modifying a DB instance in a DB cluster.

Amazon RDS DB instance outage or reboot

A DB instance outage can occur when a DB instance is rebooted. It can also occur when the DB instance is put into a state that prevents access to it, and when the database is restarted. A reboot can occur when you manually reboot your DB instance. A reboot can also occur when you change a DB instance setting that requires a reboot before it can take effect.

A DB instance reboot occurs when you change a setting that requires a reboot, or when you manually cause a reboot. A reboot can occur immediately if you change a setting and request that the change take effect immediately. Or it can occur during the DB instance's maintenance window.

A DB instance reboot occurs immediately when one of the following occurs:

You change the backup retention period for a DB instance from 0 to a nonzero value or from a nonzero value to 0. You then set Apply Immediately to true.
You change the DB instance class, and Apply Immediately is set to true.

A DB instance reboot occurs during the maintenance window when one of the following occurs:

You change the backup retention period for a DB instance from 0 to a nonzero value or from a nonzero value to 0, and Apply Immediately is set to false.
You change the DB instance class, and Apply Immediately is set to false.

When you change a static parameter in a DB parameter group, the change doesn't take effect until the DB instance associated with the parameter group is rebooted. The change requires a manual reboot. The DB instance isn't automatically rebooted during the maintenance window.

Amazon RDS DB parameter changes not taking effect

In some cases, you might change a parameter in a DB parameter group but don't see the changes take effect. If so, you likely need to reboot the DB instance associated with the DB parameter group. When you change a dynamic parameter, the change takes effect immediately. When you change a static parameter, the change doesn't take effect until you reboot the DB instance associated with the parameter group.

You can reboot a DB instance using the RDS console. Or you can explicitly call theRebootDBInstance API operation. You can reboot without failover if the DB instance is in a Multi-AZ deployment. The requirement to reboot the associated DB instance after a static parameter change helps mitigate the risk of a parameter misconfiguration affecting an API call. An example of this is callingModifyDBInstance to change the DB instance class. For more information, see Modifying parameters in a DB parameter group in Amazon Aurora.

Freeable memory issues in Amazon Aurora

Freeable memory is the total random access memory (RAM) on a DB instance that can be made available to the database engine. It's the sum of the free operating-system (OS) memory and the available buffer and page cache memory. The database engine uses most of the memory on the host, but OS processes also use some RAM. Memory currently allocated to the database engine or used by OS processes isn't included in freeable memory. When the database engine is running out of memory, the DB instance can use the temporary space that is normally used for buffering and caching. As previously mentioned, this temporary space is included in freeable memory.

You use the FreeableMemory metric in Amazon CloudWatch to monitor the freeable memory. For more information, see Monitoring tools for Amazon Aurora.

If your DB instance consistently runs low on freeable memory or uses swap space, consider scaling up to a larger DB instance class. For more information, see Amazon Aurora DB instance classes.

You can also change the memory settings. For example, on Aurora MySQL , you might adjust the size of theinnodb_buffer_pool_size parameter. This parameter is set by default to 75 percent of physical memory. For more MySQL troubleshooting tips, see How can I troubleshoot low freeable memory in an Amazon RDS for MySQL database?

For Aurora Serverless v2, FreeableMemory represents the amount of unused memory that's available when the Aurora Serverless v2 DB instance is scaled to its maximum capacity. You might have the instance scaled down to relatively low capacity, but it still reports a high value for FreeableMemory, because the instance can scale up. That memory isn't available right now, but you can get it if you need it.

For every Aurora capacity unit (ACU) that the current capacity is below the maximum capacity, FreeableMemory increases by approximately 2 GiB. Thus, this metric doesn't approach zero until the DB instance is scaled up as high as it can.

If this metric approaches a value of 0, the DB instance has scaled up as much as it can. It's nearing the limit of its available memory. Consider increasing the maximum ACU setting for the cluster. If this metric approaches a value of 0 on a reader DB instance, consider adding additional reader DB instances to the cluster. That way, the read-only part of the workload can be spread across more DB instances, reducing the memory usage on each reader DB instance. For more information, see Important Amazon CloudWatch metrics for Aurora Serverless v2.

For Aurora Serverless v1, you can change the capacity range to use more ACUs. For more information, see Modifying an Aurora Serverless v1 DB cluster.

Amazon Aurora MySQL replication issues

Some MySQL replication issues also apply to Aurora MySQL. You can diagnose and correct these.

Topics

Diagnosing and resolving lag between read replicas
Diagnosing and resolving a MySQL read replication failure
Replication stopped error
Read replica replication fails to initialize metadata structure

Diagnosing and resolving lag between read replicas

After you create a MySQL read replica and the replica is available, Amazon RDS first replicates the changes made to the source DB instance from the time the read replica create operation started. During this phase, the replication lag time for the read replica is greater than 0. You can monitor this lag time in Amazon CloudWatch by viewing the Amazon RDS AuroraBinlogReplicaLag metric.

The AuroraBinlogReplicaLag metric reports the value of the Seconds_Behind_Master field of the MySQL SHOW REPLICA STATUS command. For more information, see SHOW REPLICA STATUS Statement in the MySQL documentation.

When the AuroraBinlogReplicaLag metric reaches 0, the replica has caught up to the source DB instance. If theAuroraBinlogReplicaLag metric returns -1, replication might not be active. To troubleshoot a replication error, see Diagnosing and resolving a MySQL read replication failure. AAuroraBinlogReplicaLag value of -1 can also mean that theSeconds_Behind_Master value can't be determined or isNULL.

Note

Previous versions of Aurora MySQL used SHOW SLAVE STATUS instead ofSHOW REPLICA STATUS. If you are using Aurora MySQL version 1 or 2, then use SHOW SLAVE STATUS. Use SHOW REPLICA STATUS for Aurora MySQL version 3 and higher.

The AuroraBinlogReplicaLag metric returns -1 during a network outage or when a patch is applied during the maintenance window. In this case, wait for network connectivity to be restored or for the maintenance window to end before you check the AuroraBinlogReplicaLag metric again.

The MySQL read replication technology is asynchronous. Thus, you can expect occasional increases for theBinLogDiskUsage metric on the source DB instance and for theAuroraBinlogReplicaLag metric on the read replica. For example, consider a situation where a high volume of write operations to the source DB instance occur in parallel. At the same time, write operations to the read replica are serialized using a single I/O thread. Such a situation can lead to a lag between the source instance and read replica.

For more information about read replicas and MySQL, see Replication implementation details in the MySQL documentation.

You can reduce the lag between updates to a source DB instance and the subsequent updates to the read replica by doing the following:

Set the DB instance class of the read replica to have a storage size comparable to that of the source DB instance.
Make sure that parameter settings in the DB parameter groups used by the source DB instance and the read replica are compatible. For more information and an example, see the discussion of the max_allowed_packet parameter in the next section.
Disable the query cache. For tables that are modified often, using the query cache can increase replica lag because the cache is locked and refreshed often. If this is the case, you might see less replica lag if you disable the query cache. You can disable the query cache by setting thequery_cache_type parameter to 0 in the DB parameter group for the DB instance. For more information on the query cache, see Query cache configuration.
Warm the buffer pool on the read replica for InnoDB for MySQL. For example, suppose that you have a small set of tables that are being updated often and you're using the InnoDB or XtraDB table schema. In this case, dump those tables on the read replica. Doing this causes the database engine to scan through the rows of those tables from the disk and then cache them in the buffer pool. This approach can reduce replica lag. The following shows an example.
For Linux, macOS, or Unix:

PROMPT> mysqldump \  
    -h <endpoint> \  
    --port=<port> \  
    -u=<username> \  
    -p <password> \  
    database_name table1 table2 > /dev/null

For Windows:

PROMPT> mysqldump ^  
    -h <endpoint> ^  
    --port=<port> ^  
    -u=<username> ^  
    -p <password> ^  
    database_name table1 table2 > /dev/null

Diagnosing and resolving a MySQL read replication failure

Amazon RDS monitors the replication status of your read replicas. RDS updates theReplication State field of the read replica instance toError if replication stops for any reason. You can review the details of the associated error thrown by the MySQL engines by viewing theReplication Error field. Events that indicate the status of the read replica are also generated, including RDS-EVENT-0045,RDS-EVENT-0046, and RDS-EVENT-0057. For more information about events and subscribing to events, see Working with Amazon RDS event notification. If a MySQL error message is returned, check the error in the MySQL error message documentation.

Common situations that can cause replication errors include the following:

The value for the max_allowed_packet parameter for a read replica is less than the max_allowed_packet parameter for the source DB instance.
The max_allowed_packet parameter is a custom parameter that you can set in a DB parameter group. The max_allowed_packet parameter is used to specify the maximum size of data manipulation language (DML) that can be run on the database. In some cases, themax_allowed_packet value for the source DB instance might be larger than the max_allowed_packet value for the read replica. If so, the replication process can throw an error and stop replication. The most common error is packet bigger than 'max_allowed_packet' bytes. You can fix the error by having the source and read replica use DB parameter groups with the samemax_allowed_packet parameter values.
Writing to tables on a read replica. If you're creating indexes on a read replica, you need to have the read_only parameter set to_0_ to create the indexes. If you're writing to tables on the read replica, it can break replication.
Using a nontransactional storage engine such as MyISAM. Read replicas require a transactional storage engine. Replication is only supported for the following storage engines: InnoDB for MySQL or MariaDB.
You can convert a MyISAM table to InnoDB with the following command:
alter table <schema>.<table_name> engine=innodb;
Using unsafe nondeterministic queries such as SYSDATE(). For more information, see Determination of safe and unsafe statements in binary logging in the MySQL documentation.

The following steps can help resolve your replication error:

If you encounter a logical error and you can safely skip the error, follow the steps described in Skipping the current replication error. Your Aurora MySQL DB instance must be running a version that includes themysql_rds_skip_repl_error procedure. For more information, see mysql_rds_skip_repl_error.
If you encounter a binary log (binlog) position issue, you can change the replica replay position. You do so with themysql.rds_next_master_log command for Aurora MySQL version 1 and 2. You do so with the mysql.rds_next_source_log command for Aurora MySQL version 3 and higher. Your Aurora MySQL DB instance must be running a version that supports this command to change the replica replay position. For version information, see mysql_rds_next_master_log.
If you encounter a temporary performance issue because of high DML load, you can set the innodb_flush_log_at_trx_commit parameter to 2 in the DB parameter group on the read replica. Doing this can help the read replica catch up, though it temporarily reduces atomicity, consistency, isolation, and durability (ACID).
You can delete the read replica and create an instance using the same DB instance identifier. This way, the endpoint remains the same as that of your old read replica.

If a replication error is fixed, the Replication State changes to replicating. For more information, see Troubleshooting a MySQL read replica problem.

Replication stopped error

When you call the mysql.rds_skip_repl_error command, you might receive an error message stating that replication is down or disabled.

This error message appears because replication is stopped and can't be restarted.

If you need to skip a large number of errors, the replication lag can increase beyond the default retention period for binary log files. In this case, you might encounter a fatal error because of binary log files being purged before they have been replayed on the replica. This purge causes replication to stop, and you can no longer call the mysql.rds_skip_repl_error command to skip replication errors.

You can mitigate this issue by increasing the number of hours that binary log files are retained on your replication source. After you have increased the binlog retention time, you can restart replication and call themysql.rds_skip_repl_error command as needed.

To set the binlog retention time, use the mysql_rds_set_configuration procedure. Specify a configuration parameter of 'binlog retention hours' along with the number of hours to retain binlog files on the DB cluster, up to 2160 (90 days). The default for Aurora MySQL is 24 (1 day). The following example sets the retention period for binlog files to 48 hours.

CALL mysql.rds_set_configuration('binlog retention hours', 48);

Read replica replication fails to initialize metadata structure

When you attempted to start replication, you received the following error message:

Read Replica Replication Error - SQLError: 13124, reason: Replica failed to initialize applier metadata structure from the repository

This error occurs when there is a problem with the metadata structure of the replica. To fix the metadata structure, you must create a new replica.

To prevent this from happening in the future, perform one of the following actions:

If possible, disable multi-threading on your replicas. Starting with MySQL 8.0.27, multi-threading is enabled by default.
If you need to use multi-threading on your replicas, then we recommend that you use GTID-based replication. For more information, see Using GTID-based replication.