Hello,
From the documentation, I learnt that the when a device is disconnected the Watchdog jumps in to setup the device for reconnection. The Watchdog takes about 10 seconds.
- The document says Watchdog is for RVC2 devices only. Can you please explain what happens when an RVC4 device is disconnected?
I want to customize the program's exit condition when a device is disconnected or is not available. I'm trying not to wait too long to establish a successful reconnection.
Uncertain from reading the docs, I assumed that watchdog would work on RVC4 devices too. I tried to customize the timeout using environment variables, and also through dai.BoardConfig. But I couldn't make it work. Irrespective of the custom timeout settings, the program is always taking a very long time to exit, if a connection is not established. I'm sharing a simple script that I am testing with here-
import os
import depthai as dai
import cv2
# os.environ["DEPTHAI_WATCHDOG"] = "100"
# os.environ["DEPTHAI_WATCHDOG_INITIAL_DELAY"] = "100"
os.environ["DEPTHAI_CONNECT_TIMEOUT"] = "2000"
# os.environ["DEPTHAI_BOOTUP_TIMEOUT"] = "2000"
def reconnection_callback(state: dai.Device.ReconnectionStatus):
if state == dai.Device.ReconnectionStatus.RECONNECT_FAILED:
print("Failed to reconnect. Exiting program")
os._exit(1)
try:
with dai.Pipeline() as pipeline:
config = pipeline.getBoardConfig()
config.watchdogInitialDelayMs = 100
config.watchdogTimeoutMs = 0
pipeline.setBoardConfig(config)
device = pipeline.getDefaultDevice()
device.setMaxReconnectionAttempts(1, reconnection_callback)
cam = pipeline.create(dai.node.Camera).build()
videoQueue = cam.requestOutput((640,400)).createOutputQueue()
pipeline.start()
while pipeline.isRunning():
videoIn = videoQueue.get()
assert isinstance(videoIn, dai.ImgFrame)
cv2.imshow("video", videoIn.getCvFrame())
if cv2.waitKey(1) == ord("q"):
pipeline.stop()
os._exit(1)
except Exception as e:
print(f"Exiting program due to [Exception]: {e}")
os._exit(1)
except RuntimeError as e:
print(f"Exiting Program due to [RuntimeError]: {e}")
os._exit(1)
When I tested this on an OAK4, I got the following output on screen-
[2492589325] [10.10.0.172] [1760548182.944] [host] [warning] Monitor thread (device: 2492589325 [10.10.0.172]) - ping was missed, closing the device connection
[2025-10-15 12:09:44.944] [depthai] [error] Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: '__x_0_0' (X_LINK_ERROR)'
[2025-10-15 12:11:59.630] [warning] DeviceGate getState not successful - got no response
[2025-10-15 12:11:59.630] [error] DeviceGate session state is in error state - exiting
[2492589325] [10.10.0.172] [1760548319.630] [host] [warning] Closed connection
[2492589325] [10.10.0.172] [1760548319.630] [host] [warning] Attempting to reconnect. Timeout is 10000ms
[2492589325] [10.10.0.172] [1760548330.557] [host] [warning] Reconnection unsuccessful, trying again. Attempts left: 0
Failed to reconnect. Exiting program
This whole log took more than 2 minutes! But I want to exit the program a lot sooner- maybe in 6-8 seconds (8 seconds counting from the time of disconnection). Is it possible on an RVC4 device?
On the other hand, when I tested the same program on an OAK-FFC-4P, I got the following ouput-
[2025-10-15 11:54:43.162] [depthai] [error] Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: '__x_0_0' (X_LINK_ERROR)'
[19443010817AA12E00] [1.2] [1760547284.265] [host] [warning] Closed connection
[19443010817AA12E00] [1.2] [1760547284.265] [host] [warning] Attempting to reconnect. Timeout is 10000ms
[19443010817AA12E00] [1.2] [1760547295.184] [host] [warning] Reconnection unsuccessful, trying again. Attempts left: 0
Failed to reconnect. Exiting program
In this case, the program took around 12 seconds to exit. But for RVC2 devices, I want to exit the program in 3 seconds from the time of disconnection.
In both logs, the reconnection timeout is shown as 10000ms. I believe this is what I need to customize, but the environment variables doesn't seem to change this value. Am I doing something wrong?
Why I want to do this
A few weeks ago I've encountered a device disconnection error on OAK-FFC-4P. The device was reconnected, but immediately the connection was lost. The program was suck in this cycle of disconnection and reconnection. Each cycle was taking about 10-12 seconds. I had for exit the program forcefully and restart the execution to make it work. I have a portion of the output logs from that day if it adds anything to my question-
[2025-09-30 11:05:59.066] [depthai] [error] Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: '__x_9_out' (X_LINK_ERROR)'
[2025-09-30 11:05:59.066] [depthai] [error] Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: '__x_8_out' (X_LINK_ERROR)'
[14442C10A12BAFCF00] [1.2] [1759248360.105] [host] [warning] Closed connection
[14442C10A12BAFCF00] [1.2] [1759248360.105] [host] [warning] Attempting to reconnect. Timeout is 10000ms
[14442C10A12BAFCF00] [1.2] [1759248364.336] [host] [warning] Reconnection successful
[14442C10A12BAFCF00] [1.2] [1759248373.337] [host] [warning] Monitor thread (device: 14442C10A12BAFCF00 [1.2]) - ping was missed, closing the device connection
[2025-09-30 11:06:13.447] [depthai] [error] Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: '__x_9_out' (X_LINK_ERROR)'
[14442C10A12BAFCF00] [1.2] [1759248373.448] [host] [warning] Closed connection
[14442C10A12BAFCF00] [1.2] [1759248373.448] [host] [warning] Attempting to reconnect. Timeout is 10000ms
[2025-09-30 11:06:13.449] [depthai] [error] Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: '__x_8_out' (X_LINK_ERROR)'
[14442C10A12BAFCF00] [1.2] [1759248380.286] [host] [warning] Reconnection successful
I've encountered a similar issue with OAK4 device too.
I want to exit the program sooner so that a Docker container can relaunch it, and the application wouldn't be blocked.
I appreciate any help or suggestions. Thank you.