• Hardware
  • Computer Freezes Seemingly related to Oak Camera

Hello,

There is a nebulous problem that has been impacting some of our recent devices. Through a lot of experimentation, it seems there is a relationship between capturing images from an OAK-D Pro POE camera and the computer freezing. The failures are not very reproducible so its hard to create a failure case.

The computer has been consistently freezing after 1-12 hours of operation. If i disable the link to the camera (and stop any software related to the camera), the computer stays up indefinitely. However, when the computer freezes, there aren't many clues to why its freezing.

In one instance, I saw this stack trace:

Oct 31 12:07:04 SCN-011 kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI

Oct 31 12:07:04 SCN-011 kernel: CPU: 3 PID: 1015 Comm: EventRead00Thr Not tainted 5.15.0-73-lowlatency #80-Ubuntu

Oct 31 12:07:04 SCN-011 kernel: Hardware name: SYSTEM_MANUFACTURER SYSTEM_PRODUCT_NAME/Default string, BIOS 5.19 12/28/2023

Oct 31 12:07:04 SCN-011 kernel: RIP: 0010:__skb_datagram_iter+0x1a9/0x2f0

Oct 31 12:07:04 SCN-011 kernel: Code: c6 75 53 48 29 d6 48 8b 55 10 48 01 f7 4c 89 c6 48 01 cf 48 8b 4d b8 e8 15 fe ff ff 44 8b 5d d0 41 01 c4 44 39 f0 75 59 29 c3 <0e> 84 ee fe ff ff 48 8b 55 a8 8b 82 bc 00 00>

Oct 31 12:07:04 SCN-011 kernel: RSP: 0018:ffffbc6681197ae0 EFLAGS: 00010206

Oct 31 12:07:04 SCN-011 kernel: RAX: 0000000000000400 RBX: 0000000000001c00 RCX: 00000000000b6d43

Oct 31 12:07:04 SCN-011 kernel: RDX: 0000000000000800 RSI: ffffbc6681197d48 RDI: ffffbc6681197d48

Oct 31 12:07:04 SCN-011 kernel: RBP: ffffbc6681197b40 R08: 0000000000000400 R09: ffffffffb82e8f30

Oct 31 12:07:04 SCN-011 kernel: R10: 0000000000000000 R11: 0000000000000800 R12: 0000000000000800

Oct 31 12:07:04 SCN-011 kernel: R13: 0000000000000400 R14: 0000000000000400 R15: 0000000000000000

Oct 31 12:07:04 SCN-011 kernel: FS:  00007f20de92a640(0000) GS:ffff97d677f80000(0000) knlGS:0000000000000000

Oct 31 12:07:04 SCN-011 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Oct 31 12:07:04 SCN-011 kernel: CR2: 00007fcaf62d9c68 CR3: 0000000106c42000 CR4: 0000000000350ee0

Oct 31 12:07:04 SCN-011 kernel: Call Trace:

Oct 31 12:07:04 SCN-011 kernel:  <TASK>

Oct 31 12:07:04 SCN-011 kernel:  ? receiver_wake_function+0x30/0x30

Oct 31 12:07:04 SCN-011 kernel:  skb_copy_datagram_iter+0x38/0xa0

Oct 31 12:07:04 SCN-011 kernel:  tcp_recvmsg_locked+0x2a7/0x9e0

Oct 31 12:07:04 SCN-011 kernel:  ? __tcp_send_ack.part.0+0xcf/0x1c0

Oct 31 12:07:04 SCN-011 kernel:  tcp_recvmsg+0x79/0x1c0

Oct 31 12:07:04 SCN-011 kernel:  ? _raw_spin_unlock_bh+0x1e/0x30

Oct 31 12:07:04 SCN-011 kernel:  inet_recvmsg+0x5e/0x130

Oct 31 12:07:04 SCN-011 kernel:  ? security_socket_recvmsg+0x3a/0x60

Oct 31 12:07:04 SCN-011 kernel:  sock_recvmsg+0x71/0x80

Oct 31 12:07:04 SCN-011 kernel:  __sys_recvfrom+0x1a2/0x1d0

Oct 31 12:07:04 SCN-011 kernel:  ? rseq_get_rseq_cs.isra.0+0x1b/0x230

Oct 31 12:07:04 SCN-011 kernel:  ? rseq_ip_fixup+0x72/0x1a0

Oct 31 12:07:04 SCN-011 kernel:  ? do_futex+0x162/0x1f0

Oct 31 12:07:04 SCN-011 kernel:  __x64_sys_recvfrom+0x24/0x30

Oct 31 12:07:04 SCN-011 kernel:  do_syscall_64+0x59/0xc0

Oct 31 12:07:04 SCN-011 kernel:  ? switch_fpu_return+0x4e/0xe0

Oct 31 12:07:04 SCN-011 kernel:  ? exit_to_user_mode_prepare+0x96/0xb0

Oct 31 12:07:04 SCN-011 kernel:  ? syscall_exit_to_user_mode+0x27/0x50

Oct 31 12:07:04 SCN-011 kernel:  ? __x64_sys_recvfrom+0x24/0x30

Oct 31 12:07:04 SCN-011 kernel:  ? do_syscall_64+0x69/0xc0

Oct 31 12:07:04 SCN-011 kernel:  ? do_syscall_64+0x69/0xc0

Oct 31 12:07:04 SCN-011 kernel:  ? exit_to_user_mode_prepare+0x96/0xb0

Oct 31 12:07:04 SCN-011 kernel:  ? syscall_exit_to_user_mode+0x27/0x50

Oct 31 12:07:04 SCN-011 kernel:  ? __x64_sys_recvfrom+0x24/0x30

Oct 31 12:07:04 SCN-011 kernel:  ? do_syscall_64+0x69/0xc0

Oct 31 12:07:04 SCN-011 kernel:  ? do_syscall_64+0x69/0xc0

Oct 31 12:07:04 SCN-011 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb

Oct 31 12:07:04 SCN-011 kernel: RIP: 0033:0x7f210d2a76be

Oct 31 12:07:04 SCN-011 kernel: Code: 4c 24 1c e8 54 93 f6 ff 44 8b 54 24 1c 8b 3c 24 45 31 c9 41 89 c4 48 8b 54 24 10 48 8b 74 24 08 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 e7 48 89 04 24>

Oct 31 12:07:04 SCN-011 kernel: RSP: 002b:00007f20de929c70 EFLAGS: 00000246 ORIG_RAX: 000000000000002d

Oct 31 12:07:04 SCN-011 kernel: RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f210d2a76be

Oct 31 12:07:04 SCN-011 kernel: RDX: 00000000000b6d43 RSI: 00007f20bc0b7fc0 RDI: 000000000000000c

Oct 31 12:07:04 SCN-011 kernel: RBP: 00007f20bc0b7fc0 R08: 0000000000000000 R09: 0000000000000000

Oct 31 12:07:04 SCN-011 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000

Oct 31 12:07:04 SCN-011 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 00007f20e80fff68

Oct 31 12:07:04 SCN-011 kernel:  </TASK>

Oct 31 12:07:04 SCN-011 kernel: Modules linked in: ccm binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel sou>

Oct 31 12:07:04 SCN-011 kernel:  blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear i915 ttm crct10dif_pc>

Oct 31 12:07:04 SCN-011 kernel: ---[ end trace 6c67e6131f7bcf60 ]---

Oct 31 12:07:04 SCN-011 kernel: RIP: 0010:__skb_datagram_iter+0x1a9/0x2f0

Oct 31 12:07:04 SCN-011 kernel: Code: c6 75 53 48 29 d6 48 8b 55 10 48 01 f7 4c 89 c6 48 01 cf 48 8b 4d b8 e8 15 fe ff ff 44 8b 5d d0 41 01 c4 44 39 f0 75 59 29 c3 <0f> 84 ee fe ff ff 48 8b 55 a8 8b 82 bc 00 00>

Oct 31 12:07:04 SCN-011 kernel: RSP: 0018:ffffbc6681197ae0 EFLAGS: 00010206

Oct 31 12:07:04 SCN-011 kernel: RAX: 0000000000000400 RBX: 0000000000001c00 RCX: 00000000000b6d43

Oct 31 12:07:04 SCN-011 kernel: RDX: 0000000000000800 RSI: ffffbc6681197d48 RDI: ffffbc6681197d48

Oct 31 12:07:04 SCN-011 kernel: RBP: ffffbc6681197b40 R08: 0000000000000400 R09: ffffffffb82e8f30

Oct 31 12:07:04 SCN-011 kernel: R10: 0000000000000000 R11: 0000000000000800 R12: 0000000000000800

Oct 31 12:07:04 SCN-011 kernel: R13: 0000000000000400 R14: 0000000000000400 R15: 0000000000000000

Oct 31 12:07:04 SCN-011 kernel: FS:  00007f20de92a640(0000) GS:ffff97d677f00000(0000) knlGS:0000000000000000

Oct 31 12:07:04 SCN-011 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Oct 31 12:07:04 SCN-011 kernel: CR2: 00007f6a9c02d048 CR3: 0000000106c42000 CR4: 0000000000350ee0

Oct 31 12:07:13 SCN-011 kernel: igb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Down

OS: Ubuntu 22
Kernel: 5.15.0-73-lowlatency
DepthAI Python Package: 2.28.0.0

Camera Firware Version: 0.0.28

  • I tried to update the factory firmware, but could not do it through the device manager.

I am not sure what other information is relevant that I can provide that would be helpful.

    AmmarKothari
    I'll forward this to the dev team so we can fix it asap.
    What hardware are you using as the "computer"? Sees like the network interface crashes...

    Thanks,
    Jaka

    Hi Jaka,

    I appreciate the response. I am using this computer: https://premioinc.com/products/rco-1000-ehl-10. Ah that is interesting that the networking interface is crashing. If there is any additional logging I can enable or places to look for logs that would be helpful, please let me know.

    Thank you!

      AmmarKothari

      • try different ETH port
      • Check dmesg for any logs prior to the crash
      • update network driver
      • Try with a non-lowlatency kernel

      Thanks,
      Jaka

        jakaskerl

        I have tried a different ETH port and had the same result. Sometimes the message I copied into the first message appears. Are there any clues there to where else I should be looking?

        Is there a recommended or approved network driver list I can reference?

        I can try with a non-lowlatency kernel. Is there a list of approved kernels that luxonis has tested? I'd love to be able to choose one of those to reduce the number of variations from a tested setup by Luxonis.

        Thank you!

          AmmarKothari
          If you can, easiest way would be to test on an ubuntu OS with the default kernel... Regarding the driver, I just mentioned it in case it was outdated and the patch is already available..

          Though it could just be a power issue as well.. Is there a POE switch in the middle?

          Thanks,
          Jaka

          Yes there is a POE switch in the middle. We have used this POE switch successfully in other deployments. I can experiment with another POE device as well.

          Ah ok i'll try out the default kernel for 22.04.4 . I'll see if there are any network driver updates, but i am pretty sure its all up to date.

          Was there any information from the stack trace in my first post that helps indicate where the issue is?

            AmmarKothari
            Perhaps but it is not directly visible. Seems to be related to the network interface.

            Thanks,
            Jaka

            Hi Jaka,

            I did some more debugging and reinstalled python. I am now seeing this error. This seems to be coming from the depth ai library. It has not caused the computer to freeze yet, but seems concerning. Is this a known bug?

            Nov 12 18:28:31 SCN-009 python[2147]: Stack trace (most recent call last) in thread 2196:

            Nov 12 18:28:31 SCN-009 python[2147]: #9 Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in

            Nov 12 18:28:31 SCN-009 python[2147]: #8 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f235dcbc84f, in

            Nov 12 18:28:31 SCN-009 python[2147]: #7 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f235dc2aac2, in

            Nov 12 18:28:31 SCN-009 python[2147]: #6 Object "/home/scan/.local/lib/python3.10/site-packages/depthai.cpython-310-x86_64-linux-gnu.so", at 0x7f2337bc161f, in

            Nov 12 18:28:31 SCN-009 python[2147]: #5 Object "/home/scan/.local/lib/python3.10/site-packages/depthai.cpython-310-x86_64-linux-gnu.so", at 0x7f233777aae1, in

            Nov 12 18:28:31 SCN-009 python[2147]: #4 Object "/home/scan/.local/lib/python3.10/site-packages/depthai.cpython-310-x86_64-linux-gnu.so", at 0x7f23377789f5, in

            Nov 12 18:28:31 SCN-009 python[2147]: #3 Object "/home/scan/.local/lib/python3.10/site-packages/depthai.cpython-310-x86_64-linux-gnu.so", at 0x7f2337502e03, in

            Nov 12 18:28:31 SCN-009 python[2147]: #2 Object "/home/scan/.local/lib/python3.10/site-packages/depthai.cpython-310-x86_64-linux-gnu.so", at 0x7f2337475098, in

            Nov 12 18:28:31 SCN-009 python[2147]: #1 Object "/home/scan/.local/lib/python3.10/site-packages/depthai.cpython-310-x86_64-linux-gnu.so", at 0x7f2337475084, in

            Nov 12 18:28:31 SCN-009 python[2147]: #0 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f235dfce6a8, in _PyThreadState_DeleteCurrent

            Nov 12 18:28:31 SCN-009 python[2147]: Segmentation fault (Address not mapped to object [0x7])

            Here is another suspect stack trace. This one doesn't reference the depthai library specifically, but the calls seem to be from the library based on the naming.

            Nov 13 08:51:44 SCN-009 python[2189]: #31 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064514dc43, in

            Nov 13 08:51:44 SCN-009 python[2189]: #30 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f06450017ea, in _PyEval_EvalFrameDefault

            Nov 13 08:51:44 SCN-009 python[2189]: #29 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064514dc43, in

            Nov 13 08:51:44 SCN-009 python[2189]: #28 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f06450017ea, in _PyEval_EvalFrameDefault

            Nov 13 08:51:44 SCN-009 python[2189]: #27 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064505dd03, in

            Nov 13 08:51:44 SCN-009 python[2189]: #26 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064514dc43, in

            Nov 13 08:51:44 SCN-009 python[2189]: #25 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f0645004eca, in _PyEval_EvalFrameDefault

            Nov 13 08:51:44 SCN-009 python[2189]: #24 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064514dc43, in

            Nov 13 08:51:44 SCN-009 python[2189]: #23 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f0645007cc6, in _PyEval_EvalFrameDefault

            Nov 13 08:51:44 SCN-009 python[2189]: #22 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064505dc67, in

            Nov 13 08:51:44 SCN-009 python[2189]: #21 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064514dc43, in

            Nov 13 08:51:44 SCN-009 python[2189]: #20 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f0645007ed3, in _PyEval_EvalFrameDefault

            Nov 13 08:51:44 SCN-009 python[2189]: #19 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064505ae8b, in _PyObject_MakeTpCall

            Nov 13 08:51:44 SCN-009 python[2189]: #18 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f06450c0c1d, in

            Nov 13 08:51:44 SCN-009 python[2189]: #17 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f06450c906d, in

            Nov 13 08:51:44 SCN-009 python[2189]: #16 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064505b33f, in _PyObject_Call_Prepend

            Nov 13 08:51:44 SCN-009 python[2189]: #15 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064505b00a, in _PyObject_FastCallDictTstate

            Nov 13 08:51:44 SCN-009 python[2189]: #14 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f064514dc43, in

            Nov 13 08:51:44 SCN-009 python[2189]: #13 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f0645007cc6, in _PyEval_EvalFrameDefault

            Nov 13 08:51:44 SCN-009 python[2189]: #12 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f06450a4b98, in

            Nov 13 08:51:44 SCN-009 python[2189]: #11 Object "/home/scan/.local/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so", at 0x7f0642e64374, in

            Nov 13 08:51:44 SCN-009 python[2189]: #10 Object "/home/scan/.local/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so", at 0x7f0642dc7186, in

            Nov 13 08:51:44 SCN-009 python[2189]: #9 Object "/home/scan/.local/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so", at 0x7f0642dc6ba8, in

            Nov 13 08:51:44 SCN-009 python[2189]: #8 Object "/home/scan/.local/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so", at 0x7f0642da8040, in

            Nov 13 08:51:44 SCN-009 python[2189]: #7 Object "/home/scan/.local/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so", at 0x7f0642da6e7e, in

            Nov 13 08:51:44 SCN-009 python[2189]: #6 Object "/home/scan/.pyenv/versions/3.10.15/lib/libpython3.10.so.1.0", at 0x7f0645096220, in PyDict_GetItem

            Nov 13 08:51:44 SCN-009 python[2189]: #5 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f0644e8f565, in __stack_chk_fail

            Nov 13 08:51:44 SCN-009 python[2189]: #4 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f0644e8f599, in __fortify_fail

            Nov 13 08:51:44 SCN-009 python[2189]: #3 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f0644de2675, in

            Nov 13 08:51:44 SCN-009 python[2189]: #2 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f0644d817f2, in abort

            Nov 13 08:51:44 SCN-009 python[2189]: #1 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f0644d9b475, in raise

            Nov 13 08:51:44 SCN-009 python[2189]: #0 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f0644def9fc, in pthread_kill

            Nov 13 08:51:44 SCN-009 python[2189]: Aborted (Signal sent by tkill() 2189 1000)

            @AmmarKothari We may be seeing something similar here with the OAK-D Pro PoE, though we're still struggling to reproduce the issue consistently.

            Does your whole computer freeze or just the windowing server? In our case, most freezes are momentary (3-5 seconds) then the GPU resets. In rarer cases, the freeze is indefinite and we need to SSH in and reboot.

            @paulmurphy I appreciate you sharing that. Its helpful to know its more general than just a few of our units. Hopefully we can get to the root cause!

            We are using the Pro POE. We have issues where it seems like the computer completely freezes (logs abruptly stop in journal) and some where python crashes, but the computer is fine and I can ssh in and restart the script. I don't think we see issues where the freeze is momentary although, i am not sure we would notice that in our use case.

            The computer we are using doesn't have a GPU if that is a helpful data point.

              Is there a short term fix that I can try? We are seeing this on some deployed units. I tried rolling back to version 2.26, but saw a similar freezing issue.

              AmmarKothari
              Does it help if you use some simple example (like rgb_preview or even simpler)? The last error you have sent could point to a depthai issue, but I am unsure whether it is caused by the pipeline or there is something else going on.

              Thanks,
              Jaka

              I can try that. It often takes hours for the issue to appear so the debug iteration is quite slow.