Cuda-gdb report Error under WSL2 (original) (raw)

August 12, 2024, 2:49pm 1

my environment information are as follows:
driver version on Windows: 555.99
cuda version on Windows10: 12.5
nvcc version on Windows10: V12.5.40
WSL2 Ubuntu version: 20.04.6 LTS
nvcc version on WSL2: V12.5.40
cuda-gdb version on WSL2: 13.2

my HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\GPUDebugger\EnableInterface is set to 1.

No matter what CUDA program I run, once it reaches the kernel, it will report this error (If cuda-gdb is not applicable, it will run normally)
(cuda-gdb) n
[New Thread 0x7ffff6cd3000 (LWP 312)]
[New Thread 0x7ffff59c3000 (LWP 313)]
[Detaching after fork from child process 314]
[New Thread 0x7ffff4fb5000 (LWP 322)]
[Thread 0x7ffff4fb5000 (LWP 322) exited]
[New Thread 0x7ffff4fb5000 (LWP 323)]
[New Thread 0x7fffe1e12000 (LWP 324)]
Error: get_elf_image(0): Failed to read the ELF image handle 93825002597824 relocated 1, error=CUDBG_ERROR_INVALID_ARGS, error message=

Blockquote

qcoelho August 12, 2024, 9:16pm 2

Hello, can you run a CUDA app without the debugger?
Have you installed any CUDA version inside WSL?
What does nvidia-smi output in WSL?

Thank you for your help
I ran the following command:

nvcc -g -G main.cu
or nvcc -g -G --generate-code arch=compute_89,code=sm_89 main.cu
cuda-gdb ./a.out

my cuda version on WSL2 is 12.5

nvidia-smi.exe in Windows PowerShell

image

nvidia-smi in Ubuntu20.04.6 LTS of WSL2

image

nvcc -V and cuda-gdb version:

image

CUDA注册表

The CUDA program can output results normally when run directly, but using CUDA GDB to run to the kernel will result in the above error.

Thanks

veraj August 20, 2024, 6:26am 4

Hi, @3356538486

This is more like a specific ENV setup issue.
Can you uninstall and reinstall your CUDA/driver on Windows side again ?

I have uninstalled and reinstall the Driver and CUDA on Windows, and update both to the latest version.
I also have uninstalled and reinstalled the CUDA on WSL2, and update it to the latest version.
now, my environment information is as follows:

Windows 10 (professional edition) 64bit 22H2 19045.4412
CPU: Intel i5-12600KF
GPU: NVIDIA RTX 4060
Driver on Windows: 560.81
CUDA on Windows: 12.6, v12.6.20
WSL2: Ubuntu 20.04.6 LTS (GNU/Linux 5.10.16.3-microsoft-standard-WSL2 x86_64)
Driver on WSL2: 560.81 (shared with Windows)
CUDA on WSL2: 12.6, v12.6.20
cuda-gdb on WSL2: 13.2
gcc: 9.4.0
gdb: 9.2
python3: 3.8.10
HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\GPUDebugger\ EnableInterface is set to 1.

However, the latest version still has the same issue. Directly compiling and running are normal, only errors occur when using cuda-gdb.
The following figure shows the output obtained by compiling and running 'cuda-samples-12.4/1_Utilities/deviceQuery’ normally

image

I tried the simplest Hello World program and it worked fine when compiled and run directly. However, when checking with compute-sanitizer, errors were found, and when running with cuda-gdb, errors were also reported.

image

image

The registry settings should also be normal

image

I’m not sure if it’s due to the kernel version below. (my WSL2 is Ubuntu 20.04.6 LTS (GNU/Linux 5.10.16.3-microsoft-standard-WSL2 x86_64))

image

Looking forward to your reply. Thank you very much!

qcoelho August 20, 2024, 10:29pm 6

Hello, could you please provide us with detailed log by setting an environment variable named NVLOG_CONFIG_FILE pointing to the configuration file nvlog.config, running the app under the debugger and uploading the /tmp/debugger.log result.

debugger.log (61.1 KB)

supplement:
Running the lspci | grep -i nvidia command under WSL2 does not produce any output

image

image

Running compute-sanitizer will receive a prompt from “Error: Device not support.” (However, it appears that the GPU is still executing and receiving output “Hello world”)

image

Thanks

qcoelho August 22, 2024, 9:07pm 8

After the elf error, can you run the debugger command info shared and post the output?
Also, can you check if hardware-accelerated GPU scheduling enabled on your system?
Can you post the env in which you run the app via the env command?

Thank you for your reply

I try to run the follows command:

nvcc -g -G --generate-code arch=compute_89,code=sm_89 hello.cu
cuda-gdb ./a.out

the ouput is as the following figure:

image

after I run the info shared command

image

Add some additional information with ‘info devices’

image

image

Does the ‘hardware-accelerated GPU scheduling’ refer to the following figure ‘硬件加速GPU计划’ on Windows?
This option has always been turned off, but the CUDA application on Windows and WSL2 can run normally without debugger.

image

I tried to turn on this option and restarted the computer, but the result was the same.

image

the output runing env is as the follows:

image

look forward to your reply, thanks you!

qcoelho August 29, 2024, 10:28pm 10

Thank you, could you please run dxdiag on the host and post a screenshot of the Display tab?

I run the comman dxdiag.exe on Windows PowerShell, and output is as follows: (I have two screens, the main screen is 4K, and the other is 1080P)
I have saved all the information to ‘DxDiag.txt’
DxDiag.txt (106.3 KB)

image

image

supplement:
I opened the “NVIDIAControl Panel” -->“System Information” → ‘Save’, and obtained the following file:
NVIDIA System Information 08-30-2024 09-00-54.txt (3.1 KB)

image

image

look forward to your reply, thanks you!

qcoelho September 10, 2024, 4:17am 12

Thank you for all the details, we’ve been able to reproduce the failure and will investigate.

Thanks for your help! looking forward to your reply.

Hello, is there any new progress on this issue?
looking forward to your reply, Thanks!

qcoelho October 11, 2024, 3:07am 16

Hello,

Our team is working on it.
In the meantime, do you have access to a windows 11 system where you could try with a similar setup?

Hi
my other PC is also running windows 10, and doesn’t have an NVIDIA GPU.
My current PC is mainly used for work, considering the stability of many other software, it is not suitable to upgrade to windows 11.
I’m sorry about this.

qcoelho October 29, 2024, 6:25am 18

Could you please run a wsl --update and then a wsl --version?

Thank you very much!
I can now use the cuda-gdb normally after running wsl --update.
The updated WSL version is as follows:

WSL 版本: 2.3.24.0
内核版本: 5.15.153.1-2
WSLg 版本: 1.0.65
MSRDC 版本: 1.2.5620
Direct3D 版本: 1.611.1-81528511
DXCore 版本: 10.0.26100.1-240331-1435.ge-release
Windows 版本: 10.0.19045.4412

system Closed November 12, 2024, 1:40pm 20

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.