PCIE bus error (console information log) (original) (raw)

The console shows the following pcie bus error log
What is the impact on PCIE devices?

[ 102.779116] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 102.779118] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 102.779121] pcieport 0001:00:00.0: [12] Timeout
[ 102.779641] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 102.779643] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 102.779645] pcieport 0001:00:00.0: [12] Timeout
[ 102.780284] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 102.780285] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 102.780288] pcieport 0001:00:00.0: [12] Timeout
[ 102.786086] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 102.786088] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 102.786091] pcieport 0001:00:00.0: [12] Timeout
[ 102.934002] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 102.934005] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 102.934008] pcieport 0001:00:00.0: [12] Timeout
[ 145.776864] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 145.776868] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 145.776872] pcieport 0001:00:00.0: [12] Timeout
[ 145.786781] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 145.786784] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 145.786787] pcieport 0001:00:00.0: [12] Timeout
[ 145.874489] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 145.874492] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 145.874495] pcieport 0001:00:00.0: [12] Timeout
[ 209.767260] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 209.767263] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 209.767267] pcieport 0001:00:00.0: [12] Timeout
[ 209.816840] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 209.816843] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000
[ 209.816846] pcieport 0001:00:00.0: [12] Timeout
[ 304.786046] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 304.786051] pcieport 0001:00:00.0: device [10de:229e] error status/mask=00001000/0000e000

In theory the problem is corrected, but it isn’t really possible to know based on what is presented.

For background, many PCIe devices have an optional “Advanced Error Correction”, and this allows not only detecting various error types, but often fixing those errors. The nature of the error depends on the specific error, and although often the problem is one of signal quality, it also is not unusual for this to be related to a software issue, e.g., a mismatched driver or argument passed to the driver.

Note that the particular PCIe device itself defines much of this. Yours is apparently at slot 0001:00:00.0. Normally one would use lspci to find out more information. Some information on this:

With that you could see verbose information about the specific device, and then attach a copy to the forum. More information would probably be available then.

We would also need to know the exact model of Jetson. This includes whether there is a custom or third party carrier board involved, or if this is purely a developer’s kit. I suggest adding this information:

Dear Linuxdev

1: Jeton Orin Nano
2: The company developed its own board based on the Jetson Orin Nano line

3: nv_boot_control.conf`
TNSPEC 3767-300-0003-P.1-1-1-jetson-orin-nano-devkit-
COMPATIBLE_SPEC 3767–0003–1–jetson-orin-nano-devkit-
TEGRA_BOOT_STORAGE nvme0n1
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

4:head -n 1 /etc/nv_tegra_release
# R36 (release), REVISION: 4.3, GCID: 38968081, BOARD: generic, EABI: aarch64, DATE: Wed Jan 8 01:49:37 UTC 2025

5: There is no change to the PCIE settings. Only io_expansion is added to control peripheral power.

6:PCIE message
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 229e (rev a1)
0001:01:00.0 Network controller: Realtek Semiconductor Co., Ltd. Device c852 (rev 01)
0004:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0004:01:00.0 Non-Volatile memory controller: ADATA Technology Co., Ltd. Device 2269 (rev 03)
0008:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0008:01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)

sudo lspci -s ‘0001:00:00.0’ -vvv 2>&1 | tee log_lspci.txt
log_lspci_s.txt (5.1 KB)

Since the occurrence is very random, there is no pcie problem now. We have to wait for it to happen before capturing the relevant logs.