• DepthAI
  • OAK-FFC-3P-OG output stopped after 15 hours

I have the following pipeline:

  pipeline = std::make_unique<dai::Pipeline>();
   color_camera = pipeline->create<dai::node::ColorCamera>();
   encoder = pipeline->create<dai::node::VideoEncoder>();
   video_out = pipeline->create<dai::node::XLinkOut>();
   control = pipeline->create<dai::node::XLinkIn>();

   control->setStreamName("ctrl");
   video_out->setStreamName("h264");
   color_camera->setInterleaved(true);
   color_camera->setImageOrientation(
      dai::CameraImageOrientation::ROTATE_180_DEG);
   color_camera->setBoardSocket(dai::CameraBoardSocket::RGB);
   if (width > 1920 || height > 1080) {
      color_camera->setResolution(
         dai::ColorCameraProperties::SensorResolution::THE_4_K);
   } else {
      color_camera->setResolution(
         dai::ColorCameraProperties::SensorResolution::THE_1080_P);
   }
   color_camera->setVideoSize(width, height);
   color_camera->setFps(fps);
   color_camera->initialControl.setAutoFocusMode(
      dai::CameraControl::AutoFocusMode::OFF);
   encoder->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::H264_MAIN);
   }
   
   gst_print("Linking depthai pipeline\n");
   color_camera->video.link(encoder->input);
   control->out.link(color_camera->inputControl);
   encoder->bitstream.link(video_out->input);
   try {
      device = std::make_unique<dai::Device>(*pipeline, info);
      device->setSystemInformationLoggingRate(0.2f);
      device->setLogOutputLevel(dai::LogLevel::TRACE);
      device->setLogLevel(dai::LogLevel::TRACE);
      queue_out = device->getOutputQueue("h264");
      control_queue_in = device->getInputQueue("ctrl");

   } catch (std::exception &e) {
      cout << "An exception occurred. Exception Nr. " << e.what() << '\n';
      return false;
   }

And after few hours the output of queue_out inside pipeline stops and the cpu usage get lower to 0.3 %. And no error is shown in console. What can be the problem? And how to debug it? Thanks

...
[14442C1001DBAACE00] [54787.849] [system] [info] Cpu Usage - LeonOS 12.86%, LeonRT: 3.39%
[14442C1001DBAACE00] [54792.850] [system] [info] Memory Usage - DDR: 106.38 / 357.38 MiB, CMX: 2.17 / 2.50 MiB, LeonOS Heap: 21.13 / 78.54 MiB, LeonRT Heap: 3.80 / 25.71 MiB
[14442C1001DBAACE00] [54792.850] [system] [info] Temperatures - Average: 51.96 °C, CSS: 53.55 °C, MSS 51.36 °C, UPA: 50.92 °C, DSS: 52.02 °C
[14442C1001DBAACE00] [54792.850] [system] [info] Cpu Usage - LeonOS 12.97%, LeonRT: 3.72%
[14442C1001DBAACE00] [54797.851] [system] [info] Memory Usage - DDR: 106.38 / 357.38 MiB, CMX: 2.17 / 2.50 MiB, LeonOS Heap: 21.13 / 78.54 MiB, LeonRT Heap: 3.80 / 25.71 MiB
[14442C1001DBAACE00] [54797.851] [system] [info] Temperatures - Average: 52.40 °C, CSS: 53.33 °C, MSS 52.24 °C, UPA: 51.36 °C, DSS: 52.68 °C
[14442C1001DBAACE00] [54797.851] [system] [info] Cpu Usage - LeonOS 12.86%, LeonRT: 3.39%
[14442C1001DBAACE00] [54802.852] [system] [info] Memory Usage - DDR: 106.38 / 357.38 MiB, CMX: 2.17 / 2.50 MiB, LeonOS Heap: 21.13 / 78.54 MiB, LeonRT Heap: 3.80 / 25.71 MiB
[14442C1001DBAACE00] [54802.852] [system] [info] Temperatures - Average: 50.92 °C, CSS: 51.80 °C, MSS 50.25 °C, UPA: 50.47 °C, DSS: 51.14 °C
[14442C1001DBAACE00] [54802.852] [system] [info] Cpu Usage - LeonOS 0.26%, LeonRT: 0.16%
[14442C1001DBAACE00] [54807.853] [system] [info] Memory Usage - DDR: 106.38 / 357.38 MiB, CMX: 2.17 / 2.50 MiB, LeonOS Heap: 21.13 / 78.54 MiB, LeonRT Heap: 3.80 / 25.71 MiB
[14442C1001DBAACE00] [54807.853] [system] [info] Temperatures - Average: 49.97 °C, CSS: 51.14 °C, MSS 50.25 °C, UPA: 49.59 °C, DSS: 48.92 °C
[14442C1001DBAACE00] [54807.853] [system] [info] Cpu Usage - LeonOS 0.54%, LeonRT: 0.14%
[14442C1001DBAACE00] [54812.854] [system] [info] Memory Usage - DDR: 106.38 / 357.38 MiB, CMX: 2.17 / 2.50 MiB, LeonOS Heap: 21.13 / 78.54 MiB, LeonRT Heap: 3.80 / 25.71 MiB
[14442C1001DBAACE00] [54812.854] [system] [info] Temperatures - Average: 49.97 °C, CSS: 51.14 °C, MSS 49.37 °C, UPA: 49.14 °C, DSS: 50.25 °C
[14442C1001DBAACE00] [54812.854] [system] [info] Cpu Usage - LeonOS 0.22%, LeonRT: 0.15%
[14442C1001DBAACE00] [54817.855] [system] [info] Memory Usage - DDR: 106.38 / 357.38 MiB, CMX: 2.17 / 2.50 MiB, LeonOS Heap: 21.13 / 78.54 MiB, LeonRT Heap: 3.80 / 25.71 MiB
[14442C1001DBAACE00] [54817.855] [system] [info] Temperatures - Average: 49.64 °C, CSS: 50.47 °C, MSS 48.92 °C, UPA: 48.92 °C, DSS: 50.25 °C
...

Architecture: armv7l
Docker container: arm32v7/debian:bullseye-slim
DepthAi version: commit - c2ccafd5be7c2b0624addaca62f6c52935c9bdc5 main branch from 2. february 2022

  • erik replied to this.

    Hello oto313 , sorry about the inconvenience. Could you try this PR? It will soon be mainlined (depthai 2.15), and there are a few stability fixes added.
    Thanks, Erik

      erik
      Sorry I did not notice that you send me PR to depthai python repo, but I am using c++ only depthai.

      • erik replied to this.

        Hi, I tried 2.15 depthai, but same issue happens.
        I am retrieving output of queue by following code:

        bool hasTimeout = false;
              gst_print("calling get\n");
              auto h264packet = queue_out->get<dai::ImgFrame>(std::chrono::duration_cast<std::chrono::seconds>(1s), hasTimeout);
              gst_print("calling get done\n");
              if(hasTimeout){
                 gst_print("Retrieve timeout\n");
                 return false;
              }

        I also turned on logs by setting environment variable DEPTHAI_LEVEL=trace

        Here is output:

        [2022-02-24 07:28:17.181] [trace] Log vector decoded, size: 3
        [14442C1001DBAACE00] [18187.511] [system] [info] Memory Usage - DDR: 94.51 / 339.99 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 21.38 / 78.29 MiB, LeonRT Heap: 3.76 / 41.54 MiB
        [14442C1001DBAACE00] [18187.511] [system] [info] Temperatures - Average: 47.24 °C, CSS: 48.03 °C, MSS 46.90 °C, UPA: 47.13 °C, DSS: 46.90 °C
        [14442C1001DBAACE00] [18187.511] [system] [info] Cpu Usage - LeonOS 0.30%, LeonRT: 0.17%
        [2022-02-24 07:28:17.707] [trace] Received message from device (h264) - parsing time: 88µs, data size: 14797, object type: 1 object data:
        0000: b9 06 b9 08 18 81 cd 39 81 d0 39 00 01 00 82 00 e1 e8 2f 82 00 e1 e8 2f 00 00 86 58 53 08 00 b9
        0020: 02 86 ec db 03 00 86 b7 b0 9a 05 b9 02 85 0c 47 86 c5 a9 47 00
        calling get done
        calling get
        calling get done
        Retrieve timeout
        calling get
        calling get done
        Retrieve timeout
        calling get
        calling get done
        Retrieve timeout
        calling get
        calling get done
        Retrieve timeout
        calling get
        [2022-02-24 07:28:22.182] [trace] Log vector decoded, size: 3
        [14442C1001DBAACE00] [18192.512] [system] [info] Memory Usage - DDR: 94.51 / 339.99 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 21.38 / 78.29 MiB, LeonRT Heap: 3.76 / 41.54 MiB
        [14442C1001DBAACE00] [18192.512] [system] [info] Temperatures - Average: 47.35 °C, CSS: 48.47 °C, MSS 46.23 °C, UPA: 47.58 °C, DSS: 47.13 °C
        [14442C1001DBAACE00] [18192.512] [system] [info] Cpu Usage - LeonOS 0.44%, LeonRT: 0.18%
        [2022-02-24 07:28:22.739] [trace] Received message from device (h264) - parsing time: 89µs, data size: 15647, object type: 1 object data:
        0000: b9 06 b9 08 18 81 1f 3d 81 20 3d 00 01 00 82 00 81 b8 2f 82 00 81 b8 2f 00 00 86 ef 53 08 00 b9
        0020: 02 86 f1 db 03 00 86 b3 fa 8c 07 b9 02 85 11 47 86 c1 f3 39 02
        calling get done
        calling get
        calling get done
        Retrieve timeout
        calling get
        calling get done
        Retrieve timeout
        calling get
        calling get done
        Retrieve timeout
        calling get
        calling get done
        Retrieve timeout
        calling get
        [2022-02-24 07:28:27.183] [trace] Log vector decoded, size: 3
        [14442C1001DBAACE00] [18197.513] [system] [info] Memory Usage - DDR: 94.51 / 339.99 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 21.38 / 78.29 MiB, LeonRT Heap: 3.76 / 41.54 MiB
        [14442C1001DBAACE00] [18197.513] [system] [info] Temperatures - Average: 47.41 °C, CSS: 47.80 °C, MSS 46.45 °C, UPA: 47.80 °C, DSS: 47.58 °C
        [14442C1001DBAACE00] [18197.513] [system] [info] Cpu Usage - LeonOS 0.23%, LeonRT: 0.17%
        [2022-02-24 07:28:27.772] [trace] Received message from device (h264) - parsing time: 92µs, data size: 15677, object type: 1 object data:
        0000: b9 06 b9 08 18 81 3d 3d 81 40 3d 00 01 00 82 00 21 88 2f 82 00 21 88 2f 00 00 86 86 54 08 00 b9
        0020: 02 86 f6 db 03 00 86 02 bd 7d 09 b9 02 85 16 47 86 7c 38 2c 04
        calling get done
        calling get
        calling get done
        Retrieve timeout
        calling get
        calling get done
        Retrieve timeout
        • erik replied to this.

          Thanks for reporting oto313 , I have forwarded this to firmware engineers and we will take a look at it.

          6 days later

          Any progress? Or horizon when you can investigate this?

          Thanks

          • erik replied to this.

            Hello oto313 , I believe due to the nature of this sporadic issue, debugging and fixing/testing takes quite a long time, but I will recheck with FW engineers on it.
            Thanks, Erik

            Ok thanks for reply. If any help is needed i am happy to do so.

            20 days later

            Now I got also some other error message. And I forgot to mention that i am using luxonis module with raspberry pi hq camera

            [14442C1001DBAACE00] [118.886] [system] [info] Temperatures - Average: 53.06 °C, CSS: 53.99 °C, MSS 53.33 °C, UPA: 51.80 °C, DSS: 53.12 °C
            [14442C1001DBAACE00] [118.886] [system] [info] Cpu Usage - LeonOS 13.14%, LeonRT: 3.92%
            [14442C1001DBAACE00] [123.887] [system] [info] Memory Usage - DDR: 94.51 / 339.99 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 21.38 / 78.29 MiB, LeonRT Heap: 3.76 / 41.54 MiB
            [14442C1001DBAACE00] [123.887] [system] [info] Temperatures - Average: 53.39 °C, CSS: 54.42 °C, MSS 52.90 °C, UPA: 52.68 °C, DSS: 53.55 °C
            [14442C1001DBAACE00] [123.887] [system] [info] Cpu Usage - LeonOS 13.02%, LeonRT: 3.94%
            [14442C1001DBAACE00] [128.888] [system] [info] Memory Usage - DDR: 94.51 / 339.99 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 21.38 / 78.29 MiB, LeonRT Heap: 3.76 / 41.54 MiB
            [14442C1001DBAACE00] [128.888] [system] [info] Temperatures - Average: 53.00 °C, CSS: 54.42 °C, MSS 51.58 °C, UPA: 52.68 °C, DSS: 53.33 °C
            [14442C1001DBAACE00] [128.888] [system] [info] Cpu Usage - LeonOS 13.13%, LeonRT: 4.22%
            [14442C1001DBAACE00] [133.889] [system] [info] Memory Usage - DDR: 94.51 / 339.99 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 21.38 / 78.29 MiB, LeonRT Heap: 3.76 / 41.54 MiB
            [14442C1001DBAACE00] [133.889] [system] [info] Temperatures - Average: 53.33 °C, CSS: 54.21 °C, MSS 53.33 °C, UPA: 52.90 °C, DSS: 52.90 °C
            [14442C1001DBAACE00] [133.889] [system] [info] Cpu Usage - LeonOS 13.11%, LeonRT: 3.95%
            [14442C1001DBAACE00] [138.890] [system] [info] Memory Usage - DDR: 94.51 / 339.99 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 21.38 / 78.29 MiB, LeonRT Heap: 3.76 / 41.54 MiB
            [14442C1001DBAACE00] [138.890] [system] [info] Temperatures - Average: 53.39 °C, CSS: 54.42 °C, MSS 52.68 °C, UPA: 53.12 °C, DSS: 53.33 °C
            [14442C1001DBAACE00] [138.890] [system] [info] Cpu Usage - LeonOS 12.99%, LeonRT: 3.94%
            [14442C1001DBAACE00] [143.891] [system] [info] Memory Usage - DDR: 94.51 / 339.99 MiB, CMX: 2.18 / 2.50 MiB, LeonOS Heap: 21.38 / 78.29 MiB, LeonRT Heap: 3.76 / 41.54 MiB
            [14442C1001DBAACE00] [143.891] [system] [info] Temperatures - Average: 53.22 °C, CSS: 54.42 °C, MSS 52.90 °C, UPA: 52.46 °C, DSS: 53.12 °C
            [14442C1001DBAACE00] [143.891] [system] [info] Cpu Usage - LeonOS 13.18%, LeonRT: 4.01%
            Retrieve timeout
            [14442C1001DBAACE00] [149.513] [system] [critical] Fatal error. Please report to developers. Log: 'Fatal error on MSS CPU: trap: 00, address: 00000000' '0'
            Retrieve timeout
            Retrieve timeout
            • erik replied to this.

              oto313 Oh this seems interesting, looks like a firmware crash. Could you provide the full minimal reproducible code, so we can debug it locally?
              Thanks, Erik

              5 days later

              I will try to reproduce it. Normally it does not log any critical message.