Error with “Nvidia Container Runtime with Docker Integration” on AGX Orin with JP6.2 (original) (raw)
Hi, all
Update:
The new 28.0.1 docker release has fixed this issue. SDKmanager can work normally now.
$ sudo apt-get install docker-ce docker-ce-cli
Thanks for your patience. According to this issue:
$ sudo docker run hello-world
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
$ journalctl -xu docker.service
...
Feb 24 02:48:46 tegra-ubuntu dockerd[76713]: failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: invalid argument
Feb 24 02:48:46 tegra-ubuntu systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
The root cause is that the latest docker (v28.0.0) requires more kernel config which is not enabled on r36.4.3 (JetPack 6.2) by default.
We are working with our internal team to fix this issue.
Currently, there are some possible workarounds for your reference:
(thanks to @Kangalow @hex4def6 @whitesscott for the contribution)
WAR-1: Downgrade to the 27.5.1 docker
- You will see the below error message when setting up device with JetPack 6.2:
- Please ignore the failure and login to the device to run the following command:
$ sudo apt-get install -y docker-ce=5:27.5.1-1~ubuntu.22.04~jammy --allow-downgrades
$ sudo apt-get install -y docker-ce-cli=5:27.5.1-1~ubuntu.22.04~jammy --allow-downgrades
$ sudo apt-mark hold docker-ce=5:27.5.1-1~ubuntu.22.04~jammy
$ sudo apt-mark hold docker-ce-cli=5:27.5.1-1~ubuntu.22.04~jammy
$ ./NV_L4T_DOCKER_TARGET_POST_INSTALL_COMP.sh
- Verify
$ sudo docker --version
Docker version 27.5.1, build 9f9e405
$ sudo systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2025-02-24 05:03:21 UTC; 16s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 34864 (dockerd)
Tasks: 17
Memory: 29.1M
CPU: 261ms
CGroup: /system.slice/docker.service
└─34864 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Feb 24 05:03:20 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:20.970132435Z" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so usi>
Feb 24 05:03:21 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:21.052802756Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
Feb 24 05:03:21 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:21.053766567Z" level=info msg="Loading containers: start."
Feb 24 05:03:21 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:21.124370043Z" level=warning msg="Could not load necessary modules for IPSEC rules: protocol not>
Feb 24 05:03:21 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:21.175636062Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.>
Feb 24 05:03:21 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:21.224770440Z" level=info msg="Loading containers: done."
Feb 24 05:03:21 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:21.240763535Z" level=info msg="Docker daemon" commit=4c9b3b0 containerd-snapshotter=false storag>
Feb 24 05:03:21 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:21.240879431Z" level=info msg="Daemon has completed initialization"
Feb 24 05:03:21 tegra-ubuntu dockerd[34864]: time="2025-02-24T05:03:21.263237945Z" level=info msg="API listen on /run/docker.sock"
Feb 24 05:03:21 tegra-ubuntu systemd[1]: Started Docker Application Container Engine.
$ sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-tensorrt:r10.3.0-devel
root@tegra-ubuntu:/# /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
...
[02/24/2025-05:29:02] [I] Average on 10 runs - GPU latency: 0.0347168 ms - Host latency: 0.0474365 ms (enqueue 0.0302002 ms)
[02/24/2025-05:29:02] [I] Average on 10 runs - GPU latency: 0.0346436 ms - Host latency: 0.047168 ms (enqueue 0.0301514 ms)
[02/24/2025-05:29:02] [I]
[02/24/2025-05:29:02] [I] === Performance summary ===
[02/24/2025-05:29:02] [I] Throughput: 15546 qps
[02/24/2025-05:29:02] [I] Latency: min = 0.0450439 ms, max = 0.140625 ms, mean = 0.0479416 ms, median = 0.0473633 ms, percentile(90%) = 0.0489502 ms, percentile(95%) = 0.050293 ms, percentile(99%) = 0.0616455 ms
[02/24/2025-05:29:02] [I] Enqueue Time: min = 0.0285339 ms, max = 0.0996094 ms, mean = 0.030676 ms, median = 0.0302734 ms, percentile(90%) = 0.0317383 ms, percentile(95%) = 0.032959 ms, percentile(99%) = 0.0415039 ms
[02/24/2025-05:29:02] [I] H2D Latency: min = 0.00524902 ms, max = 0.0258789 ms, mean = 0.00624994 ms, median = 0.00619507 ms, percentile(90%) = 0.00646973 ms, percentile(95%) = 0.0065918 ms, percentile(99%) = 0.00708008 ms
[02/24/2025-05:29:02] [I] GPU Compute Time: min = 0.0324707 ms, max = 0.10498 ms, mean = 0.035119 ms, median = 0.034668 ms, percentile(90%) = 0.0359497 ms, percentile(95%) = 0.0371094 ms, percentile(99%) = 0.0460205 ms
[02/24/2025-05:29:02] [I] D2H Latency: min = 0.0055542 ms, max = 0.0219727 ms, mean = 0.00657305 ms, median = 0.00653076 ms, percentile(90%) = 0.00683594 ms, percentile(95%) = 0.00695801 ms, percentile(99%) = 0.0078125 ms
[02/24/2025-05:29:02] [I] Total Host Walltime: 3.00013 s
[02/24/2025-05:29:02] [I] Total GPU Compute Time: 1.63795 s
[02/24/2025-05:29:02] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[02/24/2025-05:29:02] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[02/24/2025-05:29:02] [W] * GPU compute time is unstable, with coefficient of variance = 5.98857%.
[02/24/2025-05:29:02] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[02/24/2025-05:29:02] [I] Explanations of the performance metrics are printed in the verbose logs.
[02/24/2025-05:29:02] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
WAR-2: Build and flash custom kernel with the required config
Please refer to Kernel Customization — NVIDIA Jetson Linux Developer Guide
- Download the r36.4.3 source package here and run the below command:
$ tar xf public_sources.tbz2
$ cd /Linux_for_Tegra/source
$ tar xf kernel_src.tbz2
$ tar xf kernel_oot_modules_src.tbz2
$ tar xf nvidia_kernel_display_driver_source.tbz2
- Enable config
Adding below config to Linux_for_Tegra/source/kernel/kernel-jammy-src/arch/arm64/configs/defconfig:
CONFIG_IP_SET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_NETFILTER_XT_SET=m
- Build the custom kernel
$ export CROSS_COMPILE=$HOME/Desktop/Toolchain_gcc_11.3/aarch64--glibc--stable-2022.08-1/bin/aarch64-buildroot-linux-gnu-
$ make -C kernel
$ export INSTALL_MOD_PATH=<install-path>/Linux_for_Tegra/rootfs/
$ sudo -E make install -C kernel
$ cp kernel/kernel-jammy-src/arch/arm64/boot/Image <install-path>/Linux_for_Tegra/kernel/Image
- Flash r36.4.3 image
- Run SDKManager to install SDK components. It should finish successfully like below:
- Verify
$ sudo docker --version
Docker version 28.0.0, build f9ced58
$ sudo systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2025-02-24 06:12:59 UTC; 43min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 26327 (dockerd)
Tasks: 13
Memory: 7.8G
CPU: 4min 31.263s
CGroup: /system.slice/docker.service
└─26327 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Feb 24 06:12:59 tegra-ubuntu dockerd[26327]: time="2025-02-24T06:12:59.171168564Z" level=info msg="Docker daemon" commit=af898ab containerd-snapshotter=false storag>
Feb 24 06:12:59 tegra-ubuntu dockerd[26327]: time="2025-02-24T06:12:59.171724546Z" level=info msg="Initializing buildkit"
Feb 24 06:12:59 tegra-ubuntu dockerd[26327]: time="2025-02-24T06:12:59.222523480Z" level=info msg="Completed buildkit initialization"
Feb 24 06:12:59 tegra-ubuntu dockerd[26327]: time="2025-02-24T06:12:59.237147987Z" level=info msg="Daemon has completed initialization"
Feb 24 06:12:59 tegra-ubuntu dockerd[26327]: time="2025-02-24T06:12:59.237343768Z" level=info msg="API listen on /run/docker.sock"
Feb 24 06:12:59 tegra-ubuntu systemd[1]: Started Docker Application Container Engine.
$ sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-tensorrt:r10.3.0-devel
root@tegra-ubuntu:/# /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
...
[02/24/2025-06:56:59] [I] Average on 10 runs - GPU latency: 0.0666748 ms - Host latency: 0.0834717 ms (enqueue 0.0402832 ms)
[02/24/2025-06:56:59] [I] Average on 10 runs - GPU latency: 0.0663574 ms - Host latency: 0.0811768 ms (enqueue 0.0387207 ms)
[02/24/2025-06:56:59] [I]
[02/24/2025-06:56:59] [I] === Performance summary ===
[02/24/2025-06:56:59] [I] Throughput: 12287.4 qps
[02/24/2025-06:56:59] [I] Latency: min = 0.0768738 ms, max = 0.185547 ms, mean = 0.0828475 ms, median = 0.081543 ms, percentile(90%) = 0.0887756 ms, percentile(95%) = 0.0905762 ms, percentile(99%) = 0.101715 ms
[02/24/2025-06:56:59] [I] Enqueue Time: min = 0.0361633 ms, max = 0.0998535 ms, mean = 0.0392462 ms, median = 0.0385132 ms, percentile(90%) = 0.0402832 ms, percentile(95%) = 0.043457 ms, percentile(99%) = 0.0566406 ms
[02/24/2025-06:56:59] [I] H2D Latency: min = 0.0065918 ms, max = 0.0449219 ms, mean = 0.00811712 ms, median = 0.00799561 ms, percentile(90%) = 0.00860596 ms, percentile(95%) = 0.0090332 ms, percentile(99%) = 0.0108337 ms
[02/24/2025-06:56:59] [I] GPU Compute Time: min = 0.059967 ms, max = 0.105713 ms, mean = 0.0679207 ms, median = 0.0667725 ms, percentile(90%) = 0.0742188 ms, percentile(95%) = 0.0756226 ms, percentile(99%) = 0.0835876 ms
[02/24/2025-06:56:59] [I] D2H Latency: min = 0.00463867 ms, max = 0.112549 ms, mean = 0.00681027 ms, median = 0.0067749 ms, percentile(90%) = 0.00805664 ms, percentile(95%) = 0.00854492 ms, percentile(99%) = 0.00927734 ms
[02/24/2025-06:56:59] [I] Total Host Walltime: 3.00015 s
[02/24/2025-06:56:59] [I] Total GPU Compute Time: 2.50383 s
[02/24/2025-06:56:59] [W] * GPU compute time is unstable, with coefficient of variance = 5.43696%.
[02/24/2025-06:56:59] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[02/24/2025-06:56:59] [I] Explanations of the performance metrics are printed in the verbose logs.
[02/24/2025-06:56:59] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
WAR-3: Update kernel image with the required config
- Download the r36.4.3 source package here and run the below command:
$ tar xf public_sources.tbz2
$ cd /Linux_for_Tegra/source
$ tar xf kernel_src.tbz2
$ tar xf kernel_oot_modules_src.tbz2
$ tar xf nvidia_kernel_display_driver_source.tbz2
- Enable config
Adding below config to Linux_for_Tegra/source/kernel/kernel-jammy-src/arch/arm64/configs/defconfig:
CONFIG_IP_SET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_NETFILTER_XT_SET=m
- Build the custom kernel
$ export CROSS_COMPILE=$HOME/Desktop/Toolchain_gcc_11.3/aarch64--glibc--stable-2022.08-1/bin/aarch64-buildroot-linux-gnu-
$ make -C kernel
- Update kernel image
# On Target
$ mkdir -p /usr/lib/modules/5.15.148-tegra/kernel/net/netfilter/ipset
# On Host
$ scp /Linux_for_Tegra/source/out/nvidia-linux-header/arch/arm64/boot/Image [Jetson]:/boot/Image
$ scp /Linux_for_Tegra/source/kernel/kernel-jammy-src/net/netfilter/xt_set.ko [Jetson]:/usr/lib/modules/5.15.148-tegra/kernel/net/netfilter/.
$ scp /Linux_for_Tegra/source/kernel/kernel-jammy-src/net/netfilter/ipset/ip_set_hash_net.ko [Jetson]:/usr/lib/modules/5.15.148-tegra/kernel/net/netfilter/ipset/.
$ scp /Linux_for_Tegra/source/kernel/kernel-jammy-src/net/netfilter/ipset/ip_set.ko [Jetson]:/usr/lib/modules/5.15.148-tegra/kernel/net/netfilter/ipset/.
#On Target
$ depmod -a 5.15.148-tegra
$ update-initramfs -c -k 5.15.148-tegra
$ reboot
- Verify
$ sudo docker --version
Docker version 28.0.0, build f9ced58
$ sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-tensorrt:r10.3.0-devel
root@tegra-ubuntu:/# /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
...
[02/26/2025-06:45:44] [I] Average on 10 runs - GPU latency: 0.0656006 ms - Host latency: 0.0818359 ms (enqueue 0.0415771 ms)
[02/26/2025-06:45:44] [I] Average on 10 runs - GPU latency: 0.0655762 ms - Host latency: 0.0809326 ms (enqueue 0.0402344 ms)
[02/26/2025-06:45:44] [I]
[02/26/2025-06:45:44] [I] === Performance summary ===
[02/26/2025-06:45:44] [I] Throughput: 12231.4 qps
[02/26/2025-06:45:44] [I] Latency: min = 0.0751953 ms, max = 0.129395 ms, mean = 0.0826511 ms, median = 0.0812988 ms, percentile(90%) = 0.0893555 ms, percentile(95%) = 0.0912476 ms, percentile(99%) = 0.101227 ms
[02/26/2025-06:45:44] [I] Enqueue Time: min = 0.0366211 ms, max = 0.0927734 ms, mean = 0.0399322 ms, median = 0.0391846 ms, percentile(90%) = 0.041748 ms, percentile(95%) = 0.0439453 ms, percentile(99%) = 0.0535889 ms
[02/26/2025-06:45:44] [I] H2D Latency: min = 0.00601196 ms, max = 0.0402832 ms, mean = 0.00832953 ms, median = 0.00805664 ms, percentile(90%) = 0.00939941 ms, percentile(95%) = 0.0100098 ms, percentile(99%) = 0.0115509 ms
[02/26/2025-06:45:44] [I] GPU Compute Time: min = 0.0568237 ms, max = 0.0999146 ms, mean = 0.0672211 ms, median = 0.06604 ms, percentile(90%) = 0.0744629 ms, percentile(95%) = 0.0760498 ms, percentile(99%) = 0.0833435 ms
[02/26/2025-06:45:44] [I] D2H Latency: min = 0.00463867 ms, max = 0.0119629 ms, mean = 0.00710001 ms, median = 0.00708008 ms, percentile(90%) = 0.00830078 ms, percentile(95%) = 0.00878906 ms, percentile(99%) = 0.00952148 ms
[02/26/2025-06:45:44] [I] Total Host Walltime: 3.00015 s
[02/26/2025-06:45:44] [I] Total GPU Compute Time: 2.46675 s
[02/26/2025-06:45:44] [W] * GPU compute time is unstable, with coefficient of variance = 5.93523%.
[02/26/2025-06:45:44] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[02/26/2025-06:45:44] [I] Explanations of the performance metrics are printed in the verbose logs.
[02/26/2025-06:45:44] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
Thanks.