Installation troubleshooting — ROCm installation (Linux) (original) (raw)
Contents
- Issue #1: Installation methods
- Issue #2: Install prerequisites
- Issue #3: PATH variable
- Issue #4: C++ libraries
- Issue #5: Application hangs on Multi-GPU systems
- Issue #6: Additional packages for Docker installations
- Issue #7: Installations using Python wheels (.whl files) do not support soft links
- Issue #8: The AMDGPU driver is not loaded after installation
- Issue #9: Cannot access the AMD GPU or accelerator after installation
Installation troubleshooting#
2025-05-14
6 min read time
Applies to Linux
Troubleshooting describes issues that some users encounter when installing the ROCm tools or libraries.
Issue #1: Installation methods#
As an example, the latest version of ROCm is 6.0.2, but the installation instructions result in release 6.0.0 being installed.
Solution: You may have used the quick-start installation method which only installs the latest major release. Use one of the other available installation methods:
- Quick-start installation - Installs only the latest major release (i.e. 6.0.0, or 6.1.0)
- Native package manager install method - Installs the specified major and minor release version (i.e. 6.0.0, 6.0.2)
- amdgpu-install method - Installs the specified major and minor release version (i.e. 6.0.0, 6.0.2)
Refer to ROCm Issue #2422 for additional details.
Issue #2: Install prerequisites#
When installing, I see the following message: Problem: nothing provides perl-URI-Encode needed to be installed by ...
Solution: Ensure that the Installation prerequisites are installed. There are prerequisite PERL packages required for SUSE. RHEL also requires Extra Packages for Enterprise Linux (EPEL) to be installed, which is also mentioned in prerequisites. Be sure to install those first, then repeat your installation steps.
Refer to ROCm Issue #1827.
Issue #3: PATH variable#
After successfully installing ROCm, when I run rocminfo
(or another ROCm tool) the command is not found.
Solution: You may need to update your PATH
environment variable as described in Post-installation instructions.
Refer to ROCm Issue #1607.
Issue #4: C++ libraries#
When compiling HIP programs, I get a linking error for -lstdc++
, or fatal error: 'cmath' file not found
.
Solution: You can install C++ libraries using your package manager. The following is an Ubuntu example:
sudo apt-get install libstdc++-12-dev
Refer to ROCm Issue #2031.
Issue #5: Application hangs on Multi-GPU systems#
Running on a system with multiple GPUs the application hangs with the GPU use at 100%, but without the expected GPU temperature buildup
This issue often results in the following message in the application transcript:
NCCL WARN Missing "iommu=pt" from kernel command line which can lead to system instablity or hang!
Solution: To resolve this issue add iommu=pt
to GRUB_CMDLINE_LINUX_DEFAULT
in /etc/default/grub
. Then run the following command:
Reboot the system, and run the following command:
The returned information should reflect the addition of iommu
:
BOOT_IMAGE=/vmlinuz-5.15.0-101-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro iommu=pt
Refer to RCCL Issue #1129 for more information.
Issue #6: Additional packages for Docker installations#
Docker images often come with minimal installations, meaning some essential packages might be missing. When installing ROCm within a Docker container, you might need to install additional packages for a successful ROCm installation. Use the following commands to install the prerequisite packages.
Ubuntu
apt update apt install sudo wget
RHEL
dnf install sudo wget subscription-manager register --username --password subscription-manager attach --auto subscription-manager repos --enable codeready-builder-for-rhel-9-x86_64-rpms
SLES
zypper install sudo wget SUSEConnect SUSEConnect -r SUSEConnect -p sle-module-desktop-applications/15.5/x86_64 SUSEConnect -p sle-module-development-tools/15.5/x86_64 SUSEConnect -p PackageHub/15.5/x86_64
After installing these packages and registering using your license for Enterprise Linux (if applicable), install ROCm following the Quick start installation guide in your Docker container.
Issue #7: Installations using Python wheels (.whl files) do not support soft links#
If you have installed ROCm or any ROCm component using a Python wheel (.whl
file), running a ROCm command which is soft-linked will fail with not found
on Ubuntu, bad interpreter: No such file or directory
on SLES, and ModuleNotFoundError
on RHEL.
Solution: Python wheel files do not support soft links (symbolic links). You will need to run soft-linked commands from within their installation directories, or using the full path to their locations.
For example, run rocm-smi
on ROCm 6.2 in the following way:
cd /opt/rocm-6.2.0/libexec/rocm_smi/ python3 rocm_smi.py
or
python3 /opt/rocm-6.2.0/libexec/rocm_smi/rocm_smi.py
See Symbolic links in wheels for more information.
Issue #8: The AMDGPU driver is not loaded after installation#
When you are verifying the ROCm installation according to the post-install instructions, the rocm-smi
and rocminfo
commands might fail with the error messageDriver not initialized
or not display any output. This could indicate the AMDGPU driver is not loaded.
Solution: Ensure the AMDGPU driver is not on a denylist such as /etc/modprobe.d/blacklist-amdgpu.conf
. The location of this file might vary depending on the system distribution and version. To verify whether the driver is on a denylist, use the following command:
grep amdgpu /etc/modprobe.d/*
Note
When installing the AMDGPU driver with Secure Boot enabled, you must sign amdgpu-dkms
to prevent potential system loading issues. For more information, see Secure Boot Support. If you prefer not to sign the AMDGPU driver, you can disable Secure Boot from the BIOS settings instead.
Issue #9: Cannot access the AMD GPU or accelerator after installation#
If the group permissions are not set properly during ROCm installation, you might get an error similar to Permission denied
when attempting to access the AMD GPU.
Solution: You must be part of the video
and render
groups to access the AMD GPU or accelerator. To learn how to add an account to these groups, see Configuring permissions for GPU access.