So, we have a bit of a plumbing nightmare that consists of a body of code that probes for PoE cameras, then dispatches a thread per camera which dynamically loads the appropriate body of code for the camera in question and performs callbacks to the ultimate consumer based on queue activity.
The vast majority of the time this works just fine, but occasionally during initialization of more than one camera, things blow up deep in XLink land. We basically start the new threads as fast as we can, so there's a probability of one that we're hitting things in the guts of thing while booting all the cameras. My guess is that the code path for associating a pipeline and booting a device, in our case this little snippet:
// Connect to device and start pipeline
dai:๐evice device(pipeline,
drongoCameras[DRONGO_CAM_HASH(startBlock -> uid)].devInfo);
May not be entirely thread safe.
Because it works the vast majority of the time it's been difficult to characterize beyond "every now and then it dies horribly during initialization". Some examples:
[Thread 0x7fffc2f73700 (LWP 467188) exited]
[New Thread 0x7fffc2772700 (LWP 467191)]
[New Thread 0x7fffb9b8b700 (LWP 467192)]
[2023-03-15 13:04:01.353] [warning] Monitor thread (device: 1844301031455C1200 [192.168.88.129]) - ping was missed, closing the device connection
F: [global] [ 643353] [EventRead00Thr] tcpipPlatformRead:272 Cannot find file descriptor by key: 58
[Thread 0x7fffbab8d700 (LWP 467190) exited]
terminate called after throwing an instance of 'dai::XLinkWriteError'
what(): Couldn't write data to stream: '__bootloader' (X_LINK_ERROR)
---
[Thread 0x7fffc2772700 (LWP 468300) exited]
[New Thread 0x7fffc1f71700 (LWP 468303)]
[New Thread 0x7fffb9b8b700 (LWP 468304)]
F: [global] [ 787036] [Scheduler00Thr] dispatcherResponseServe:925 no request for this response: XLINK_WRITE_RESP 1
[2023-03-15 13:39:54.242] [warning] Monitor thread (device: 1844301031455C1200 [192.168.88.129]) - ping was missed, closing the device connection
F: [global] [ 796242] [EventRead00Thr] tcpipPlatformRead:272 Cannot find file descriptor by key: 58
[Thread 0x7fffbab8d700 (LWP 468302) exited]
terminate called after throwing an instance of 'dai::XLinkWriteError'
what(): Couldn't write data to stream: '__bootloader' (X_LINK_ERROR)
---
[Thread 0x7fffc3774700 (LWP 468706) exited]
terminate called after throwing an instance of 'dai::XLinkWriteError'
what(): Couldn't write data to stream: '__bootloader' (X_LINK_ERROR)
Thread 28 "testharness" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffbb38e700 (LWP 468701)]
0x00007ffff75b700b in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) where
#0 0x00007ffff75b700b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7596859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff7971a31 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff797d5dc in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff797d647 in std::terminate() ()
from /lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff797d8e9 in __cxa_throw ()
from /lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x0000555555c6ffbe in dai::XLinkStream::writeSplit (
this=0x7fffb166b7e0, d=0x7fffaa662010, size=23491155, split=5242880)
at /home/chris/luxonis/depthai-core/src/xlink/XLinkStream.cpp:143
#7 0x0000555555b11afa in dai::DeviceBootloader::bootMemory (
this=0x7fffbb36e110, embeddedFw=...)
at /home/chris/luxonis/depthai-core/src/device/DeviceBootloader.cpp:1318
#8 0x0000555555a530ec in dai::DeviceBase::init2 (this=0x7fffbb36e910,
cfg=..., pathToMvcmd=..., pipeline=...)
at /home/chris/luxonis/depthai-core/src/device/DeviceBase.cpp:596
#9 0x0000555555a515b8 in dai::DeviceBase::init (this=0x7fffbb36e910,
version=dai::OpenVINO::VERSION_2022_1,
--Type <RET> for more, q to quit, c to continue without paging--
maxUsbSpeed=dai::UsbSpeed::SUPER, pathToMvcmd=...)
at /home/chris/luxonis/depthai-core/src/device/DeviceBase.cpp:479
#10 0x0000555555a4f285 in dai::DeviceBase::DeviceBase (
this=0x7fffbb36e910, version=dai::OpenVINO::VERSION_2022_1,
devInfo=..., maxUsbSpeed=dai::UsbSpeed::SUPER)
at /home/chris/luxonis/depthai-core/src/device/DeviceBase.cpp:326
#11 0x0000555555a409ed in dai::DeviceBase::DeviceBase<bool, true> (
this=0x7fffbb36e910, version=dai::OpenVINO::VERSION_2022_1,
devInfo=..., usb2Mode=false)
at /home/chris/luxonis/depthai-core/include/depthai/device/DeviceBase.hpp:257
#12 0x0000555555a3c62d in dai::Device::Device (this=0x7fffbb36e910,
pipeline=..., devInfo=...)
at /home/chris/luxonis/depthai-core/src/device/Device.cpp:44
#13 0x00007ffff7f525b4 in rawCameraModel (
startInfo=0x7ffff7fc59b8 <drongoCameras+248>)
at /home/chris/farmwave/drongo-core/src/sensor/sensor.cpp:367
#14 0x00007ffff7ebe609 in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007ffff7693133 in clone () from /lib/x86_64-linux-gnu/libc.so.6
I suppose I can go ahead and serialize access to the constructor for dai:๐evice and see what happens, but I was curious if this was a known problem.