• HardwareROS
  • Fatal error on MSS CPU while using DepthAI ROS Noetic driver

Hello,

I am using luxonis/depthai-ros:v2.7.4-noetic Docker image to run a OAK-D Pro PoE camera. The camera driver fails with some error logs like the following:

[19443010A133731300] [x.y.z.w] [28643.589] [system] [warning] PRINT:LeonMss: Unexpected trap ( 9) at address 0x80220c2c
2023-07-20T11:13:55.692920479Z Bad trap, 0x9
PSR: 0xf3400ec6 PC: 0x80220c30 NPC: 0x80220c34 TPC: 0x80220c2c
G1:  0x00000000 G2: 0x0000ff04 G3:  0x0b010048 G4:  0x808fe5b4
2023-07-20T11:13:55.692934345Z G5:  0x00000078 G6: 0x00001a0d G7:  0x0000001e I0:  0x00000000
I1:  0x00000004 I2: 0x00000003 I3:  0x001a3803 I4:  0x8068d848
I5:  0x00000000 I6: 0x8066d378 I7:  0x80220d7c Y:   0x00000000

2023-07-20T11:13:55.692971045Z L0:  0x808d2848 L1: 0x8007c004 L2:  0x80221728 L3:  0x00000002
L4:  0x00000004 L5: 0x00000004 L6:  0x808d27f8 L7:  0x808d2818
I0:  0x7820c000 I1: 0x00000960 I2:  0x8066d46c I3:  0x001a3804
I4:  0x8066f288 I5: 0x00000000 I6:  0x8066d408 I7:  0x802692b8
SRA: 0x80555000 SA0:0x00000438 SA1: 0x00000000 SA2: 0x00000000
SA3: 0xcb9afa68 SA4:0x78209978 SA5: 0x805fbf30
Unexpected trap ( 9) at address 0x80220c2c
Bad trap, 0x9
PSR: 0xf3400ec6 PC: 0x80220c30 NPC: 0x80220c34 TPC: 0x80220c2c
G1:  0x00000000 G2: 0x0000ff04 G3:  0x0b010048 G4:  0x808fe5b4
G5:  0x00000078 G6: 0x00001a0d G7:  0x0000001e I0:  0x00000000
2023-07-20T11:13:55.693034595Z I1:  0x00000004 I2: 0x00000003 I3:  0x001a3803 I4:  0x8068d848
I5:  0x00000000 I6: 0x8066d378 I7:  0x80220d7c Y:   0x00000000

2023-07-20T11:13:55.693048742Z L0:  0x808d2848 L1: 0x8007c004 L2:  0x80221728 L3:  0x00000002
L4:  0x00000004 L5: 0x00000004 L6:  0x808d27f8 L7:  0x808d2818
2023-07-20T11:13:55.693057309Z I0:  0x7820c000 I1: 0x00000960 I2:  0x8066d46c I3:  0x001a3804
I4:  0x8066f288 I5: 0x00000000 I6:  0x8066d408 I7:  0x802692b8
SRA: 0x80555000 SA0:0x00000438 SA1: 0x00000000 SA2: 0x00000000
SA3: 0xcb9afa68 SA4:0x78209978 SA5: 0x805fbf30
[19443010A133731300] [x.y.z.w] [28647.404] [system] [critical] Fatal error. Please report to developers. Log: 'Fatal error on MSS CPU: trap: 09, address: 80220C2C' '0'
2023-07-20T11:13:59.502021720Z [19443010A133731300] [x.y.z.w] [28647.406] [system] [warning] PRINT:LeonCss: Fatal error on MSS CPU: trap: 09, address: 80220C2C

I've also opened a ticket on the depthai-ros Github repo with more details on this issue: luxonis/depthai-ros352

One thing I noticed is that I see this MSS CPU error message on a very busy network with 4 other different PoE cameras (not OAK-Ds) and couple of other networked devices. I was unable to replicate the issue on a different but a more silent network, e.g. with one OAK-D Pro PoE and one another PoE camera. Our setup requires using multiple PoE cameras, sometimes from different vendors, and we have never encountered something like this on other networked devices.

I've also found some other forum users asking about the same issue, feels like it might be a common one. Considering the information here and the Github ticket that I shared the link above together, do you have any suggestions on solving this issue?

Thanks,

18 days later

I had a chance to test the default configuration in the depthai-ros Github repo on the OAK-D Pro USB camera (above was OAK-D Pro PoE) and the default configuration causes the same error on both USB and PoE cameras.

[184430103153B00E00] [1.2.2] [9178.605] [system] [warning] PRINT:LeonMss: Unexpected trap ( 9) at address 0x8022fdc4
Bad trap, 0x9
PSR: 0xf3400ec7 PC: 0x8022fdc8 NPC: 0x8022fdcc TPC: 0x8022fdc4
G1:  0x00000000 G2: 0x0000ff04 G3:  0x0b010048 G4:  0x80912f14
G5:  0x00000078 G6: 0x00000859 G7:  0x0000001e I0:  0x00000000
I1:  0x00000004 I2: 0x00000001 I3:  0x00086711 I4:  0x806a1590
I5:  0x00000000 I6: 0x806810b8 I7:  0x8022ff14 Y:   0x00000000

L0:  0x88ff9a68 L1: 0x8007ce04 L2:  0x802308c0 L3:  0x00000004
L4:  0x00000008 L5: 0x00000008 L6:  0x806d9ef8 L7:  0x00000001
I0:  0x7820c070 I1: 0x00000960 I2:  0x806811ac I3:  0x00086712
I4:  0x80682fd0 I5: 0x00000070 I6:  0x80681148 I7:  0x80278488
SRA: 0x00000001 SA0:0xf3900fe1 SA1: 0x88ffb430 SA2: 0xf3400fe4
SA3: 0x0000003e SA4:0xf3900ae6 SA5: 0xf3900fc1
Unexpected trap ( 9) at address 0x8022fdc4
Bad trap, 0x9
PSR: 0xf3400ec7 PC: 0x8022fdc8 NPC: 0x8022fdcc TPC: 0x8022fdc4
G1:  0x00000000 G2: 0x0000ff04 G3:  0x0b010048 G4:  0x80912f14
G5:  0x00000078 G6: 0x00000859 G7:  0x0000001e I0:  0x00000000
I1:  0x00000004 I2: 0x00000001 I3:  0x00086711 I4:  0x806a1590
I5:  0x00000000 I6: 0x806810b8 I7:  0x8022ff14 Y:   0x00000000

L0:  0x88ff9a68 L1: 0x8007ce04 L2:  0x802308c0 L3:  0x00000004
L4:  0x00000008 L5: 0x00000008 L6:  0x806d9ef8 L7:  0x00000001
I0:  0x7820c070 I1: 0x00000960 I2:  0x806811ac I3:  0x00086712
I4:  0x80682fd0 I5: 0x00000070 I6:  0x80681148 I7:  0x80278488
SRA: 0x00000001 SA0:0xf3900fe1 SA1: 0x88ffb430 SA2: 0xf3400fe4
SA3: 0x0000003e SA4:0xf3900ae6 SA5: 0xf3900fc1
[184430103153B00E00] [1.2.2] [9179.621] [system] [critical] Fatal error. Please report to developers. Log: 'Fatal error on MSS CPU: trap: 09, address: 8022FDC4' '0'
[184430103153B00E00] [1.2.2] [1691461627.114] [host] [debug] Timesync thread exception caught: Couldn't read data from stream: '__timesync' (X_LINK_ERROR)
[2023-08-08 02:27:07.115] [depthai] [debug] DataOutputQueue (nn_nn) closed
[2023-08-08 02:27:07.115] [depthai] [debug] DataOutputQueue (stereo_stereo) closed
[2023-08-08 02:27:07.115] [depthai] [debug] DataOutputQueue (rgb_isp) closed
[2023-08-08 02:27:07.115] [depthai] [debug] DataOutputQueue (imu_imu) closed
[184430103153B00E00] [1.2.2] [1691461627.114] [host] [debug] Log thread exception caught: Couldn't read data from stream: '__log' (X_LINK_ERROR)
[184430103153B00E00] [1.2.2] [1691461627.864] [host] [debug] Watchdog thread exception caught: Couldn't write data to stream: '__watchdog' (X_LINK_ERROR)

The solution is hidden in the documentation

One way to reduce CSS CPU consumption would be to reduce the 3A rate by currently reducing camera FPS.

Reducing the FPS from 60 to 10 reduced the CPU usage to 50% and I was able to run the cameras continuously for 7 days. I believe this would be the solution to the problem mentioned above.

I've also mentioned a second problem regarding ROS driver disconnection on the Github ticket, but that might be a separate issue.