Troubleshooting for Amazon Aurora - Amazon Aurora (original) (raw)

Use the following sections to help troubleshoot problems you have with DB instances in Amazon RDS and Amazon Aurora.

Topics

For information about debugging problems using the Amazon RDS API, see Troubleshooting applications on Aurora.

Can't connect to Amazon RDS DB instance

When you can't connect to a DB instance, the following are common causes:

Note

When adding a new inbound rule, you can choose My IP for Source to allow access to the DB instance from the IP address detected in your browser.
For more information about setting up security groups, seeProvide access to the DB cluster in the VPC by creating a security group.

Note

Client connections from IP addresses within the range 169.254.0.0/16 aren't permitted. This is the Automatic Private IP Addressing Range (APIPA), which is used for local-link addressing.

To configure an internet gateway for a subnet
  1. Sign in to the AWS Management Console and open the Amazon RDS console athttps://console.aws.amazon.com/rds/.
  2. In the navigation pane, choose Databases, and then choose the name of the DB instance.
  3. In the Connectivity & security tab, write down the values of the VPC ID under VPC and the subnet ID under Subnets.
  4. Open the Amazon VPC console athttps://console.aws.amazon.com/vpc/.
  5. In the navigation pane, choose Internet Gateways. Verify that there is an internet gateway attached to your VPC. Otherwise, choose Create Internet Gateway to create an internet gateway. Select the internet gateway, and then chooseAttach to VPC and follow the directions to attach it to your VPC.
  6. In the navigation pane, choose Subnets, and then select your subnet.
  7. On the Route Table tab, verify that there is a route with 0.0.0.0/0 as the destination and the internet gateway for your VPC as the target.
    If you're connecting to your instance using its IPv6 address, verify that there is a route for all IPv6 traffic (::/0) that points to the internet gateway. Otherwise, do the following:
    1. Choose the ID of the route table (rtb-xxxxxxxx) to navigate to the route table.
    2. On the Routes tab, choose Edit routes. Choose Add route, use 0.0.0.0/0 as the destination and the internet gateway as the target.
      For IPv6, choose Add route, use::/0 as the destination and the internet gateway as the target.
    3. Choose Save routes.
      Also, if you are trying to connect to IPv6 endpoint, make sure that client IPv6 address range is authorized to connect to the DB instance.
      For more information, see Working with a DB cluster in a VPC.

Testing a connection to a DB instance

You can test your connection to a DB instance using common Linux or Microsoft Windows tools.

From a Linux or Unix terminal, you can test the connection by entering the following. Replace `DB-instance-endpoint` with the endpoint and `port` with the port of your DB instance.

nc -zv DB-instance-endpoint port 

For example, the following shows a sample command and the return value.

nc -zv postgresql1.c6c8mn7fake0.us-west-2.rds.amazonaws.com 8299

  Connection to postgresql1.c6c8mn7fake0.us-west-2.rds.amazonaws.com 8299 port [tcp/vvr-data] succeeded! 

Windows users can use Telnet to test the connection to a DB instance. Telnet actions aren't supported other than for testing the connection. If a connection is successful, the action returns no message. If a connection isn't successful, you receive an error message such as the following.

C:\>telnet sg-postgresql1.c6c8mntfake0.us-west-2.rds.amazonaws.com 819

  Connecting To sg-postgresql1.c6c8mntfake0.us-west-2.rds.amazonaws.com...Could not open
  connection to the host, on port 819: Connect failed 

If Telnet actions return success, your security group is properly configured.

Note

Amazon RDS doesn't accept internet control message protocol (ICMP) traffic, including ping.

In some cases, you can connect to your DB instance but you get authentication errors. In these cases, you might want to reset the master user password for the DB instance. You can do this by modifying the RDS instance.

Amazon RDS security issues

To avoid security issues, never use your AWS account root user email address and password for a user account. Best practice is to use your root user to create users and assign those to DB user accounts. You can also use your root user to create other user accounts, if necessary.

For information about creating users, see Creating an IAM user in your AWS account. For information about creating users in AWS IAM Identity Center, see Manage identities in IAM Identity Center.

Error message "failed to retrieve account attributes, certain console functions may be impaired."

You can get this error for several reasons. It might be because your account is missing permissions, or your account hasn't been properly set up. If your account is new, you might not have waited for the account to be ready. If this is an existing account, you might lack permissions in your access policies to perform certain actions such as creating a DB instance. To fix the issue, your administrator needs to provide the necessary roles to your account. For more information, seethe IAM documentation.

Resetting the DB instance owner password

If you get locked out of your DB cluster, you can log in as the master user. Then you can reset the credentials for other administrative users or roles. If you can't log in as the master user, the AWS account owner can reset the master user password. For details of which administrative accounts or roles you might need to reset, see Master user account privileges.

You can change the DB instance password by using the Amazon RDS console, the AWS CLI command modify-db-instance, or by using the ModifyDBInstance API operation. For more information about modifying a DB instance in a DB cluster, see Modifying a DB instance in a DB cluster.

Amazon RDS DB instance outage or reboot

A DB instance outage can occur when a DB instance is rebooted. It can also occur when the DB instance is put into a state that prevents access to it, and when the database is restarted. A reboot can occur when you manually reboot your DB instance. A reboot can also occur when you change a DB instance setting that requires a reboot before it can take effect.

A DB instance reboot occurs when you change a setting that requires a reboot, or when you manually cause a reboot. A reboot can occur immediately if you change a setting and request that the change take effect immediately. Or it can occur during the DB instance's maintenance window.

A DB instance reboot occurs immediately when one of the following occurs:

A DB instance reboot occurs during the maintenance window when one of the following occurs:

When you change a static parameter in a DB parameter group, the change doesn't take effect until the DB instance associated with the parameter group is rebooted. The change requires a manual reboot. The DB instance isn't automatically rebooted during the maintenance window.

Amazon RDS DB parameter changes not taking effect

In some cases, you might change a parameter in a DB parameter group but don't see the changes take effect. If so, you likely need to reboot the DB instance associated with the DB parameter group. When you change a dynamic parameter, the change takes effect immediately. When you change a static parameter, the change doesn't take effect until you reboot the DB instance associated with the parameter group.

You can reboot a DB instance using the RDS console. Or you can explicitly call theRebootDBInstance API operation. You can reboot without failover if the DB instance is in a Multi-AZ deployment. The requirement to reboot the associated DB instance after a static parameter change helps mitigate the risk of a parameter misconfiguration affecting an API call. An example of this is callingModifyDBInstance to change the DB instance class. For more information, see Modifying parameters in a DB parameter group in Amazon Aurora.

Freeable memory issues in Amazon Aurora

Freeable memory is the total random access memory (RAM) on a DB instance that can be made available to the database engine. It's the sum of the free operating-system (OS) memory and the available buffer and page cache memory. The database engine uses most of the memory on the host, but OS processes also use some RAM. Memory currently allocated to the database engine or used by OS processes isn't included in freeable memory. When the database engine is running out of memory, the DB instance can use the temporary space that is normally used for buffering and caching. As previously mentioned, this temporary space is included in freeable memory.

You use the FreeableMemory metric in Amazon CloudWatch to monitor the freeable memory. For more information, see Monitoring tools for Amazon Aurora.

If your DB instance consistently runs low on freeable memory or uses swap space, consider scaling up to a larger DB instance class. For more information, see Amazon Aurora DB instance classes.

You can also change the memory settings. For example, on Aurora MySQL , you might adjust the size of theinnodb_buffer_pool_size parameter. This parameter is set by default to 75 percent of physical memory. For more MySQL troubleshooting tips, see How can I troubleshoot low freeable memory in an Amazon RDS for MySQL database?

For Aurora Serverless v2, FreeableMemory represents the amount of unused memory that's available when the Aurora Serverless v2 DB instance is scaled to its maximum capacity. You might have the instance scaled down to relatively low capacity, but it still reports a high value for FreeableMemory, because the instance can scale up. That memory isn't available right now, but you can get it if you need it.

For every Aurora capacity unit (ACU) that the current capacity is below the maximum capacity, FreeableMemory increases by approximately 2 GiB. Thus, this metric doesn't approach zero until the DB instance is scaled up as high as it can.

If this metric approaches a value of 0, the DB instance has scaled up as much as it can. It's nearing the limit of its available memory. Consider increasing the maximum ACU setting for the cluster. If this metric approaches a value of 0 on a reader DB instance, consider adding additional reader DB instances to the cluster. That way, the read-only part of the workload can be spread across more DB instances, reducing the memory usage on each reader DB instance. For more information, see Important Amazon CloudWatch metrics for Aurora Serverless v2.

For Aurora Serverless v1, you can change the capacity range to use more ACUs. For more information, see Modifying an Aurora Serverless v1 DB cluster.

Amazon Aurora MySQL replication issues

Some MySQL replication issues also apply to Aurora MySQL. You can diagnose and correct these.

Topics

Diagnosing and resolving lag between read replicas

After you create a MySQL read replica and the replica is available, Amazon RDS first replicates the changes made to the source DB instance from the time the read replica create operation started. During this phase, the replication lag time for the read replica is greater than 0. You can monitor this lag time in Amazon CloudWatch by viewing the Amazon RDS AuroraBinlogReplicaLag metric.

The AuroraBinlogReplicaLag metric reports the value of the Seconds_Behind_Master field of the MySQL SHOW REPLICA STATUS command. For more information, see SHOW REPLICA STATUS Statement in the MySQL documentation.

When the AuroraBinlogReplicaLag metric reaches 0, the replica has caught up to the source DB instance. If theAuroraBinlogReplicaLag metric returns -1, replication might not be active. To troubleshoot a replication error, see Diagnosing and resolving a MySQL read replication failure. AAuroraBinlogReplicaLag value of -1 can also mean that theSeconds_Behind_Master value can't be determined or isNULL.

Note

Previous versions of Aurora MySQL used SHOW SLAVE STATUS instead ofSHOW REPLICA STATUS. If you are using Aurora MySQL version 1 or 2, then use SHOW SLAVE STATUS. Use SHOW REPLICA STATUS for Aurora MySQL version 3 and higher.

The AuroraBinlogReplicaLag metric returns -1 during a network outage or when a patch is applied during the maintenance window. In this case, wait for network connectivity to be restored or for the maintenance window to end before you check the AuroraBinlogReplicaLag metric again.

The MySQL read replication technology is asynchronous. Thus, you can expect occasional increases for theBinLogDiskUsage metric on the source DB instance and for theAuroraBinlogReplicaLag metric on the read replica. For example, consider a situation where a high volume of write operations to the source DB instance occur in parallel. At the same time, write operations to the read replica are serialized using a single I/O thread. Such a situation can lead to a lag between the source instance and read replica.

For more information about read replicas and MySQL, see Replication implementation details in the MySQL documentation.

You can reduce the lag between updates to a source DB instance and the subsequent updates to the read replica by doing the following:

PROMPT> mysqldump \  
    -h <endpoint> \  
    --port=<port> \  
    -u=<username> \  
    -p <password> \  
    database_name table1 table2 > /dev/null  

For Windows:

PROMPT> mysqldump ^  
    -h <endpoint> ^  
    --port=<port> ^  
    -u=<username> ^  
    -p <password> ^  
    database_name table1 table2 > /dev/null  

Diagnosing and resolving a MySQL read replication failure

Amazon RDS monitors the replication status of your read replicas. RDS updates theReplication State field of the read replica instance toError if replication stops for any reason. You can review the details of the associated error thrown by the MySQL engines by viewing theReplication Error field. Events that indicate the status of the read replica are also generated, including RDS-EVENT-0045,RDS-EVENT-0046, and RDS-EVENT-0057. For more information about events and subscribing to events, see Working with Amazon RDS event notification. If a MySQL error message is returned, check the error in the MySQL error message documentation.

Common situations that can cause replication errors include the following:

The following steps can help resolve your replication error:

If a replication error is fixed, the Replication State changes to replicating. For more information, see Troubleshooting a MySQL read replica problem.

Replication stopped error

When you call the mysql.rds_skip_repl_error command, you might receive an error message stating that replication is down or disabled.

This error message appears because replication is stopped and can't be restarted.

If you need to skip a large number of errors, the replication lag can increase beyond the default retention period for binary log files. In this case, you might encounter a fatal error because of binary log files being purged before they have been replayed on the replica. This purge causes replication to stop, and you can no longer call the mysql.rds_skip_repl_error command to skip replication errors.

You can mitigate this issue by increasing the number of hours that binary log files are retained on your replication source. After you have increased the binlog retention time, you can restart replication and call themysql.rds_skip_repl_error command as needed.

To set the binlog retention time, use the mysql_rds_set_configuration procedure. Specify a configuration parameter of 'binlog retention hours' along with the number of hours to retain binlog files on the DB cluster, up to 2160 (90 days). The default for Aurora MySQL is 24 (1 day). The following example sets the retention period for binlog files to 48 hours.

CALL mysql.rds_set_configuration('binlog retention hours', 48);

Read replica replication fails to initialize metadata structure

When you attempted to start replication, you received the following error message:

Read Replica Replication Error - SQLError: 13124, reason: Replica failed to initialize applier metadata structure from the repository

This error occurs when there is a problem with the metadata structure of the replica. To fix the metadata structure, you must create a new replica.

To prevent this from happening in the future, perform one of the following actions: