Troubleshooting SSH errors (original) (raw)

Linux Windows


This document describes common errors that you may run into when connecting to virtual machine (VM) instances using SSH, ways to resolve errors, and methods for diagnosing failed SSH connections.

Use the SSH troubleshooting tool to help determine why an SSH connection failed. The troubleshooting tool performs the following tests to check for the cause of failed SSH connections:

Run the troubleshooting tool

You can use the Google Cloud console or the Google Cloud CLI to check for networking problems and user permission errors that might cause SSH connections to fail.

Permissions required for this task

To perform this task, you must have the followingpermissions:

If you are missing any of the preceding permissions, the troubleshooting tool skips network connectivity tests.

Console

After an SSH connection fails, you have the option to Retry the connection, or Troubleshoot the connection using the SSH-in-browser troubleshooting tool.

To run the troubleshooting tool, click Troubleshoot.

Launch SSH troubleshooting tool.

gcloud

Run the troubleshooting tool by using thegcloud compute ssh command:

gcloud compute ssh VM_NAME
--troubleshoot

Replace VM_NAME with the name of the VM that you can't connect to.

The tool prompts you to provide permission to perform the troubleshooting tests.

Review the results

After running the troubleshooting tool, do the following:

  1. Review the test results to understand why the VM's SSH connection isn't working.
  2. Resolve SSH connections by performing the remediation steps provided by the tool.
  3. Try reconnecting to the VM.
    If the connection isn't successful, try manually troubleshooting by doing the following:

Common SSH errors

The following are examples of common errors you might encounter when you use SSH to connect to Compute Engine VMs.

SSH-in-Browser errors

The following error might occur when you connect to your VM using theSSH-in-browser from the Google Cloud console:

Unauthorized Error 401

This error occurs if your user is part of an organization that is managed from within Google Workspace and there is an active restriction in the Workspace policy that prevents users from accessing SSH-in-browser and the serial console within Google Cloud.

To resolve this issue, have a Google Workspace admin do the following:

  1. Confirm that Google Cloud is enabled for the organization.
    If Google Cloud is disabled, enable it and retry the connection.
  2. Confirm that services that aren't controlled individually are enabled.
    If these services are disabled, enable them and retry the connection.

If the problem persists after enabling Google Cloud settings in Google Workspace, do the following:

  1. Capture the network traffic in an HTTP Archive Format (HAR) filestarting from when you start the SSH-in-Browser SSH connection.
  2. Create a Cloud Customer Care case and attach the HAR file.

Could Not Connect, Retrying...

The following error might occur when you start an SSH session:

Could not connect, retrying ...

Could not connect, retrying

To resolve this issue, do the following:

  1. After the VM has finished booting, retry the connection. If the connection is not successful, verify that the VM did not boot in emergency mode by running the following command:
    gcloud compute instances get-serial-port-output VM_NAME \
    | grep "emergency mode"
    If the VM boots in emergency mode, troubleshoot the VM startup process to identify where the boot process is failing.
  2. Verify that thegoogle-guest-agent.service service is running, by running the following command in the serial console.
    systemctl status google-guest-agent.service
    If the service is disabled, enable and start the service, by running the following commands:
    systemctl enable google-guest-agent.service
    systemctl start google-guest-agent.service
  3. Verify that the Linux Google Agent scripts are installed and running. For more information, seeDetermining Google Agent Status. If the Linux Google Agent is not installed,re-install it.
  4. Verify that you have the required roles to connect to the VM. If your VM uses OS Login, see Assign OS Login IAM role. If the VM doesn't use OS Login, you need the compute instance admin roleor the service account user role (if the VM is set up to run as a service account). The roles are needed to update the instance or project SSH keys-metadata.
  5. Verify that there is a firewall rule that allows SSH access by running the following command:
    gcloud compute firewall-rules list | grep "tcp:22"
  6. Verify that there is a default route to the Internet (or to the bastion host). For more information, see Checking routes.
  7. Make sure that the root volume is not out of disk space. For more information, seeTroubleshooting full disks and disk resizing.
  8. Make sure the VM has not run out of memory, by running the following command:
    gcloud compute instances get-serial-port-output instance-name \
    | grep "Out of memory: Kill process" - e "Kill process" -e "Memory cgroup out of memory" -e "oom"
    If the VM is out of memory, connect to serial console to troubleshoot.

Linux errors

Permission denied (publickey)

The following error might occur when you connect to your VM:

USERNAME@VM_EXTERNAL_IP: Permission denied (publickey).

This error can occur for several reasons. The following are some of the most common causes of this error:

Ownership

The guest environment stores authorized SSH public keys in the$HOME/.ssh/authorized_keys file. The owner of the $HOME and $HOME/.sshdirectories and the $HOME/.ssh/authorized_keys file must be the same as the user connecting to the VM.
### Permissions
The guest environment requires the following Linux permissions:
| Path | Permissions |
| --------------------------- | ----------------------- |
| /home | 0755 |
| $HOME | 0700 or 0750 or 0755 * |
| $HOME/.ssh | 0700 |
| $HOME/.ssh/authorized_keys | 0600 |
* To find out which of the options is the correct default permission for your $HOME directory, refer to the official documentation for your specific Linux distribution.

Alternatively, you can create a new VM based on the same image and check its default permissions for $HOME.
To learn how to change permissions and ownership, read aboutchmod andchown.

Connection failed

The following errors might occur when you connect to your VM from the Google Cloud console, the gcloud CLI, a bastion host or a local client:

These errors can occur for several reasons. The following are some of the most common causes of the errors:

Unexpected error

The following error might occur when you try to connect to a Linux VM:

Connection Failed You cannot connect to the VM instance because of an unexpected error. Wait a few moments and then try again.

This issue can occur for several reasons. The following are some common causes of the error:

Failed to connect to backend

The following errors might occur when you connect to your VM from the Google Cloud console or the gcloud CLI:

These errors occur when you try to use SSH to connect to a VM that doesn't have a public IP address and for which you haven't configured Identity-Aware Proxy on port 22.

To resolve this issueCreate a firewall rule on port 22 that allows ingress traffic from Identity-Aware Proxy.

Host key does not match

The following error might occur when you connect to your VM:

Host key for server IP_ADDRESS does not match

This error occurs when the host key in the ~/.ssh/known_hosts file doesn't match the VM's host key.

To resolve this issue, delete the host key from the ~/.ssh/known_hostsfile, then retry the connection.

Metadata value is too large

The following error might occur when you try to add a new SSH key to metadata:

ERROR:"Value for field 'metadata.items[X].value' is too large: maximum size 262144 character(s); actual size NUMBER_OF_CHARACTERS."

Metadata values have amaximum limit of 256 KB. To mitigate this limitation, do one of the following:

No supported authentication methods available

The following error might occur when you connect to a VM:

No supported authentication methods available (server sent: publickey,gssapi-keyex,gssapi-with-mic)

This error most commonly occurs due to an outdated SSH client. Older SSH clients might lack support for the ECDSA key types and SHA-2 hashing algorithms required by newer VMs.

For example, this error occurs if you try to connect to a Red Hat Enterprise Linux (RHEL) VM using a version of PuTTY older than 0.75.

To resolve this error, update your SSH client to the most recent stable version. After you have updated your SSH client, retry your SSH connection.

Windows errors

Permission denied, please try again

The following error might occur when you connect to your VM:

USERNAME@compute.INSTANCE_ID's password: Permission denied, please try again.

This error indicates the user trying to connect to the VM doesn't exist on the VM. The following are some of the most common causes of this error:

Permission denied (publickey,keyboard-interactive)

The following error might occur when you connect to a VM that doesn't have SSH enabled:

Permission denied (publickey,keyboard-interactive).

To resolve this error, set the enable-windows-ssh key to TRUE in project or instance metadata. For more information about setting medata, seeSet custom metadata.

Could not SSH into the instance

The following error might occur when you connect to your VM from the gcloud CLI:

ERROR: (gcloud.compute.ssh) Could not SSH into the instance. It is possible that your SSH key has not propagated to the instance yet. Try running this command again. If you still cannot connect, verify that the firewall and instance are set to accept ssh traffic.

This error can occur for several reasons. The following are some of the most common causes of the errors:

Connection timed out

Timed out SSH connections might be caused by one of the following:

Diagnose failed SSH connections

The following sections describe steps you can take to diagnose the cause of failed SSH connections and the steps you can take to fix your connections.

Before you diagnose failed SSH connections, complete the following steps:

Diagnosis methods for Linux and Windows VMs

Test connectivity

You might not be able to SSH to a VM instance because of connectivity issues linked to firewalls, network connection, or the user account. Follow the steps in this section to identify any connectivity issues.

Check your firewall rules

Compute Engine provisions each project with a default set of firewall rules that permit SSH traffic. If you are unable to access your instance, use the gcloud compute command-line tool tocheck your list of firewallsand ensure that the default-allow-ssh rule is present.

On your local workstation, run the following command:

gcloud compute firewall-rules list

If the firewall rule is missing, add it back:

gcloud compute firewall-rules create default-allow-ssh
--allow tcp:22

To view all data associated with the default-allow-ssh firewall rule in your project, use thegcloud compute firewall-rules describe command:

gcloud compute firewall-rules describe default-allow-ssh
--project=project-id

For more information about firewall rules, seeFirewall rules in Google Cloud.

Test the network connection

To determine whether the network connection is working, test the TCP handshake:

  1. Obtain the external natIP for your VM:
    gcloud compute instances describe VM_NAME \
    --format='get(networkInterfaces[0].accessConfigs[0].natIP)'
    Replace VM_NAME with the name of the VM you can't connect to.
  2. Test the network connection to your VM from your workstation:

Linux, Windows 2019/2022, and macOS

curl -vso /dev/null --connect-timeout 5 EXTERNAL_IP:PORT_NUMBER
Replace the following:

Windows 2012 and 2016

PS C:> New-Object System.Net.Sockets.TcpClient('EXTERNAL_IP',PORT_NUMBER)
Replace the following:

If the TCP handshake completes successfully, a software firewall rule is not blocking the connection, the OS is correctly forwarding packets, and a server is listening on the destination port. If the TCP handshake completes successfully but the VM doesn't accept SSH connections, the issue might be with that the sshd daemon is misconfigured or not running properly. Review the user guide for your operating system to ensure that your sshd_configis set up correctly.

To run connectivity tests for analyzing the VPC network path configuration between two VMs and check whether the programmed configuration should allow the traffic, see Check for misconfigured firewall rules in Google Cloud.

Connect as a different user

The issue that prevents you from logging in might be limited to your user account. For example, the permissions on the ~/.ssh/authorized_keys file on the instance might not be set correctly for the user.

Try logging in as a different user with the gcloud CLI by specifying ANOTHER_USERNAME with the SSH request. The gcloud CLI updates the project's metadata to add the new user and allow SSH access.

gcloud compute ssh ANOTHER_USERNAME@VM_NAME

Replace the following:

Debug issues using the serial console

We recommend that you review the logs from the serial console for connection errors. You can access the serial console as the root user from your local workstation by using a browser. This approach is useful when you cannot log in with SSH, or if the instance has no connection to the network. The serial console remains accessible in both of these situations.

To log into the VM's serial console and troubleshoot problems with the VM, follow these steps::

  1. Enable interactive access to the VM's serial console.
  2. For Linux VMs, modify the root password, add the following startup script to your VM:
    echo root:PASSWORD | chpasswd
    Replace PASSWORD with a password of your choice.
  3. Use the serial console to connect to your VM.
  4. For Linux VMs, after you're done debugging all the errors, disable the root account login:
    sudo passwd -l root

Diagnosis methods for Linux VMs

Inspect the VM instance without shutting it down

You might have an instance that you cannot connect to that continues to correctly serve production traffic. In this case, you might want to inspect the disk without interrupting the instance.

To inspect and troubleshoot the disk:

  1. Back up your boot disk by creating a snapshot of the disk.
  2. Create a regular persistent disk from that snapshot.
  3. Create a temporary instance.
  4. Attach and mount the regular persistent disk to your new temporary instance.

This procedure creates an isolated network that only allows SSH connections. This setup prevents any unintended consequences of the cloned instance interfering with your production services.

  1. Create a new VPC network to host your cloned instance:
    gcloud compute networks create debug-network
    Replace NETWORK_NAME with the name you want to call your new network.
  2. Add a firewall rule to allow SSH connections to the network:
    gcloud compute firewall-rules create debug-network-allow-ssh \
    --network debug-network \
    --allow tcp:22
  3. Create a snapshot of the boot disk.
    gcloud compute disks snapshot BOOT_DISK_NAME \
    --snapshot-names debug-disk-snapshot
    Replace BOOT_DISK_NAME with the name of the boot disk.
  4. Create a new disk with the snapshot you just created:
    gcloud compute disks create example-disk-debugging \
    --source-snapshot debug-disk-snapshot
  5. Create a new debugging instance without an external IP address:
    gcloud compute instances create debugger \
    --network debug-network \
    --no-address
  6. Attach the debugging disk to the instance:
    gcloud compute instances attach-disk debugger \
    --disk example-disk-debugging
  7. Follow the instructions toConnect to a VM using a bastion host.
  8. After you have logged into the debugger instance, troubleshoot the instance. For example, you can look at the instance logs:
    sudo su -
    mkdir /mnt/VM_NAME
    mount /dev/disk/by-id/scsi-0Google_PersistentDisk_example-disk-debugging /mnt/VM_NAME
    cd /mnt/VM_NAME/var/log

Identify the issue preventing ssh from working

ls
Replace VM_NAME with the name of the VM you can't connect to.

Use a startup script

If none of the preceding helped, you can create a startup script to collect information right after the instance starts. Follow the instructions forrunning a startup script.

Afterward, you also need to reset your instance before the metadata takes effect by usinggcloud compute instances reset.

Alternatively, you can also recreate your instance by running a diagnostic startup script:

  1. Run gcloud compute instances delete with the --keep-disks flag.
    gcloud compute instances delete VM_NAME \
    --keep-disks boot
    Replace VM_NAME with the name of the VM you can't connect to.
  2. Add a new instance with the same disk and specify your startup script.
    gcloud compute instances create NEW_VM_NAME \
    --disk name=BOOT_DISK_NAME,boot=yes \
    --metadata startup-script-url URL
    Replace the following:
    • NEW_VM_NAME is the name of the new VM you're creating
    • BOOT_DISK_NAME is the name of the boot disk from the VM you can't connect to
    • URL is the Cloud Storage URL to the script, in eithergs://BUCKET/FILE orhttps://storage.googleapis.com/BUCKET/FILEformat.

Use your disk on a new instance

If you still need to recover data from your persistent boot disk, you can detach the boot disk and then attach that disk as a secondary disk on a new instance.

  1. Delete the VM you can't connect to and keep its boot disk:
    gcloud compute instances delete VM_NAME \
    --keep-disks=boot
    Replace VM_NAME with the name of the VM you can't connect to.
  2. Create a new VM with your old VM's boot disk. Specify the name of the boot disk of the VM you just deleted.
  3. Connect to your new VM using SSH:
    gcloud compute ssh NEW_VM_NAME
    Replace NEW_VM_NAME with the name of your new VM.

Check whether or not the VM boot disk is full

Your VM might become inaccessible if its boot disk is full. This scenario can be difficult to troubleshoot as it's not always obvious when the VM connectivity issue is due to a full boot disk. For more information about this scenario, see Troubleshooting a VM that is inaccessible due to a full boot disk.

Diagnosis methods for Windows VMs

Reset SSH metadata

If you can't connect to a Windows VM using SSH, try unsetting theenable-windows-ssh metadata key and re-enabling SSH for Windows.

  1. Set the enable-windows-ssh metadata key to FALSE. For information about how to set metadata, seeSet custom metadata.
    Wait a few seconds for the change to take place.
  2. Re-enable SSH for Windows
  3. Reconnect to the VM.

Connect to the VM using RDP

If you can't diagnose and resolve the cause of failed SSH connections to your Windows VM, connect using RDP.

After you establish a connection to the VM, review theOpenSSH logs.

What's Next?