Cuda-gdb report Error under WSL2 (original) (raw)
August 12, 2024, 2:49pm 1
my environment information are as follows:
driver version on Windows: 555.99
cuda version on Windows10: 12.5
nvcc version on Windows10: V12.5.40
WSL2 Ubuntu version: 20.04.6 LTS
nvcc version on WSL2: V12.5.40
cuda-gdb version on WSL2: 13.2
my HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\GPUDebugger\EnableInterface is set to 1.
No matter what CUDA program I run, once it reaches the kernel, it will report this error (If cuda-gdb is not applicable, it will run normally)
(cuda-gdb) n
[New Thread 0x7ffff6cd3000 (LWP 312)]
[New Thread 0x7ffff59c3000 (LWP 313)]
[Detaching after fork from child process 314]
[New Thread 0x7ffff4fb5000 (LWP 322)]
[Thread 0x7ffff4fb5000 (LWP 322) exited]
[New Thread 0x7ffff4fb5000 (LWP 323)]
[New Thread 0x7fffe1e12000 (LWP 324)]
Error: get_elf_image(0): Failed to read the ELF image handle 93825002597824 relocated 1, error=CUDBG_ERROR_INVALID_ARGS, error message=
Blockquote
qcoelho August 12, 2024, 9:16pm 2
Hello, can you run a CUDA app without the debugger?
Have you installed any CUDA version inside WSL?
What does nvidia-smi
output in WSL?
Thank you for your help
I ran the following command:
nvcc -g -G main.cu
or nvcc -g -G --generate-code arch=compute_89,code=sm_89 main.cu
cuda-gdb ./a.out
my cuda version on WSL2 is 12.5
nvidia-smi.exe in Windows PowerShell
nvidia-smi in Ubuntu20.04.6 LTS of WSL2
nvcc -V and cuda-gdb version:
The CUDA program can output results normally when run directly, but using CUDA GDB to run to the kernel will result in the above error.
Thanks
veraj August 20, 2024, 6:26am 4
Hi, @3356538486
This is more like a specific ENV setup issue.
Can you uninstall and reinstall your CUDA/driver on Windows side again ?
I have uninstalled and reinstall the Driver and CUDA on Windows, and update both to the latest version.
I also have uninstalled and reinstalled the CUDA on WSL2, and update it to the latest version.
now, my environment information is as follows:
Windows 10 (professional edition) 64bit 22H2 19045.4412
CPU: Intel i5-12600KF
GPU: NVIDIA RTX 4060
Driver on Windows: 560.81
CUDA on Windows: 12.6, v12.6.20
WSL2: Ubuntu 20.04.6 LTS (GNU/Linux 5.10.16.3-microsoft-standard-WSL2 x86_64)
Driver on WSL2: 560.81 (shared with Windows)
CUDA on WSL2: 12.6, v12.6.20
cuda-gdb on WSL2: 13.2
gcc: 9.4.0
gdb: 9.2
python3: 3.8.10
HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\GPUDebugger\ EnableInterface is set to 1.
However, the latest version still has the same issue. Directly compiling and running are normal, only errors occur when using cuda-gdb.
The following figure shows the output obtained by compiling and running 'cuda-samples-12.4/1_Utilities/deviceQuery’ normally
I tried the simplest Hello World program and it worked fine when compiled and run directly. However, when checking with compute-sanitizer, errors were found, and when running with cuda-gdb, errors were also reported.
The registry settings should also be normal
I’m not sure if it’s due to the kernel version below. (my WSL2 is Ubuntu 20.04.6 LTS (GNU/Linux 5.10.16.3-microsoft-standard-WSL2 x86_64))
Looking forward to your reply. Thank you very much!
qcoelho August 20, 2024, 10:29pm 6
Hello, could you please provide us with detailed log by setting an environment variable named NVLOG_CONFIG_FILE
pointing to the configuration file nvlog.config, running the app under the debugger and uploading the /tmp/debugger.log
result.
debugger.log (61.1 KB)
supplement:
Running the lspci | grep -i nvidia command under WSL2 does not produce any output
Running compute-sanitizer will receive a prompt from “Error: Device not support.” (However, it appears that the GPU is still executing and receiving output “Hello world”)
Thanks
qcoelho August 22, 2024, 9:07pm 8
After the elf error, can you run the debugger command info shared
and post the output?
Also, can you check if hardware-accelerated GPU scheduling enabled on your system?
Can you post the env in which you run the app via the env
command?
Thank you for your reply
I try to run the follows command:
nvcc -g -G --generate-code arch=compute_89,code=sm_89 hello.cu
cuda-gdb ./a.out
the ouput is as the following figure:
after I run the info shared
command
Add some additional information with ‘info devices’
Does the ‘hardware-accelerated GPU scheduling’ refer to the following figure ‘硬件加速GPU计划’ on Windows?
This option has always been turned off, but the CUDA application on Windows and WSL2 can run normally without debugger.
I tried to turn on this option and restarted the computer, but the result was the same.
the output runing env
is as the follows:
look forward to your reply, thanks you!
qcoelho August 29, 2024, 10:28pm 10
Thank you, could you please run dxdiag
on the host and post a screenshot of the Display
tab?
I run the comman dxdiag.exe
on Windows PowerShell, and output is as follows: (I have two screens, the main screen is 4K, and the other is 1080P)
I have saved all the information to ‘DxDiag.txt’
DxDiag.txt (106.3 KB)
supplement:
I opened the “NVIDIAControl Panel” -->“System Information” → ‘Save’, and obtained the following file:
NVIDIA System Information 08-30-2024 09-00-54.txt (3.1 KB)
look forward to your reply, thanks you!
qcoelho September 10, 2024, 4:17am 12
Thank you for all the details, we’ve been able to reproduce the failure and will investigate.
Thanks for your help! looking forward to your reply.
Hello, is there any new progress on this issue?
looking forward to your reply, Thanks!
qcoelho October 11, 2024, 3:07am 16
Hello,
Our team is working on it.
In the meantime, do you have access to a windows 11 system where you could try with a similar setup?
Hi
my other PC is also running windows 10, and doesn’t have an NVIDIA GPU.
My current PC is mainly used for work, considering the stability of many other software, it is not suitable to upgrade to windows 11.
I’m sorry about this.
qcoelho October 29, 2024, 6:25am 18
Could you please run a wsl --update
and then a wsl --version
?
Thank you very much!
I can now use the cuda-gdb normally after running wsl --update
.
The updated WSL version is as follows:
WSL 版本: 2.3.24.0
内核版本: 5.15.153.1-2
WSLg 版本: 1.0.65
MSRDC 版本: 1.2.5620
Direct3D 版本: 1.611.1-81528511
DXCore 版本: 10.0.26100.1-240331-1435.ge-release
Windows 版本: 10.0.19045.4412
system Closed November 12, 2024, 1:40pm 20
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.