• DepthAI
  • Unstable connection / connectivity issues

Hello. I am using depthai-core via C++. Following the "RGB video" code example (https://docs.luxonis.com/projects/api/en/latest/samples/ColorCamera/rgb_video/#rgb-video) I am experiencing erratic connection behavior. I am connected via ethernet. 

First, is there a way to specify the IP address of the camera rather than auto-detection?  I get the sense that the core API is searching for a camera in many places and that this is part of the problem. There is no USB, only ethernet, and no need to hunt for IPs by discovery.

Second, generally speaking, it is shockingly unstable.  Half the time when I try to connect I get device not found errors.  And it seems there are some timing problems.  If I just run the code without breaks, it errors out frequently.  If I debug the code and slowly step through things, allowing time to pass between steps, it's more stable.  Am I missing something here with the depthai-core API?  It feels like it lacks basic exception handling and the ability to retry or wait etc?  It's very frustrating. Out of the box, code examples all crash unless I slowly step through the logic. Which is a bit bizarre.

Thanks!

  • erik replied to this.

    Hi jt512 ,

    Yes, it's possible to Specify the IP of the OAK POE camera.

    Please provide full MRE with error logs and versions used (depthai, bootloader), saying "it doesn't work" doesn't help us debug the issue, at all. Note that with latest depthai version we have added some POE connectivity fixes. Note also that exception handling/reconnecting is outside the scope of the depthai API library, it could be added either by user, and it might be added to depthai SDK.

    Also here's a relevant discussion, and we just released 2.21.1 depthai version that improves stability as well.

    Thoughts?

    Thanks, Erik

    I didn't say "it doesn't work," you said that. What I described is far more detailed, that demo samples consistently crash with "device not found" style errors if I execute them without breaks. If I run them in a debug mode, slowly stepping through the lines of code, things usually work.

    You say, "Note also that exception handling/reconnecting is outside the scope of the depthai API library, it could be added either by user, and it might be added to depthai SDK."

    Well, that doesn't really make sense. You're saying that internal to depthai-core there is zero error handling? So any little packet error (which is commonplace) will cause connections with the device to fault? If true that would validate what I'm saying/experiencing. You're saying the connection protocols (xlink) are inherently unstable and will constantly throw exceptions?

    I am using 2.21.0. Looks like this latest version came out just hours ago. Skeptical that will address what I'm seeing but will upgrade and give it a go.

    On this point, I have a question. If depthai-core has zero error handling, why doesn't example code contain try/catch blocks, demonstrating the proper manner for consumers to implement error handling? I'm not sure what I should be placing try/catch blocks around. From what I'm witnessing, an exception could be thrown virtually anywhere, making things a bit tedious. Is there a document that describes this? What is proper cleanup procedure if a certain type of exception is thrown?

    Is there a way to prevent dai::Device from trying to connect to a device on construction? This is a very strange implementation. Typically with an API such as this involving networking, there should be a Connect() method on the class that governs this type of behavior. How do I define properties like wait time limit? How long does it look for a device before it gives up? How long does it wait for a response in the middle of a stream before it decides the connection has been lost? How do I recycle the object and tell it to ReConnect() if a connection is lost for some reason?

    I'm reading through the code.

    https://github.com/luxonis/depthai-core/blob/main/src/device/Device.cpp

    https://github.com/luxonis/depthai-core/blob/main/src/device/DeviceBase.cpp

    https://github.com/luxonis/depthai-core/blob/main/include/depthai/device/Device.hpp

    Adding to the discussion, what I'm seeing is that both dai::Device and dai::DeviceBase constructors always call startPipeline(), which is an expensive blocking operation. I don't mean to be rude, but this is a crime against object orientation.

    Here are some screenshots. The code executing is as follows.

    std::vector<std::uint8_t> PSCore::Test1()

    {

    using namespace std;

    dai::Device \*dptr = nullptr;
    
    try
    
    {    
    
        // Create pipeline
    
        dai::Pipeline pipeline;
    
        // Define source and output
    
        auto camRgb = pipeline.create<dai::node::ColorCamera>();
    
        auto xoutRgb = pipeline.create<dai::node::XLinkOut>();
    
        xoutRgb->setStreamName("rgb");
    
        xoutRgb->input.setBlocking(false);
    
        xoutRgb->input.setQueueSize(1);
    
        // Properties
    
        camRgb->setBoardSocket(dai::CameraBoardSocket::RGB);
    
        camRgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_1080_P);
    
        camRgb->setInterleaved(false);
    
        camRgb->setColorOrder(dai::ColorCameraProperties::ColorOrder::BGR);
    
        //camRgb->setPreviewSize(300, 300);
    
        //camRgb->setVideoSize(4056, 3040);
    
        camRgb->setVideoSize(1920, 1080);
    
        // Linking
    
        //camRgb->preview.link(xoutRgb->input);
    
        camRgb->video.link(xoutRgb->input);
    
        // Connect to device and start pipeline
    
        auto deviceInfo = dai::DeviceInfo("192.168.8.116");
    
        auto device = dai::Device(pipeline, deviceInfo);
    
        dptr = &device;
    
        cout << "Connected cameras: " << device.getConnectedCameraFeatures() << endl;
    
        // Print USB speed
    
        // device already closed or disconnected exception
    
        //cout << "Usb speed: " << device.getUsbSpeed() << endl;
    
        // Bootloader version
    
        if(device.getBootloaderVersion()) {
    
            cout << "Bootloader version: " << device.getBootloaderVersion()->toString() << endl;
    
        }
    
        // Device name
    
        cout << "Device name: " << device.getDeviceName() << endl;
    
        // Output queue will be used to get the rgb frames from the output defined above
    
        //auto qRgb = device.getOutputQueue("rgb", 4, false);
    
        auto qRgb = device.getOutputQueue("rgb");
    
        while(true) 
    
        {
    
            auto inRgb = qRgb->get<dai::ImgFrame>();
    
            auto type = inRgb->getType();
    
            auto data = inRgb->getData();
    
            device.close();
    
            return data;
    
        }
    
    }
    
    catch(const std::exception &ex)
    
    {
    
        cout << ex.what();
    
        int stophere = 7;
    
    }
    
    catch(...)
    
    {
    
        int stophere = 7;
    
    }
    
    //cleanup
    
    if(dptr != nullptr)
    
    {
    
        dptr->close();
    
        //delete dptr;
    
    }
    
    return {};

    }

    These screenshots are successive runs.

    run #1, exception thrown

    run #2, successful

    run #3, run#4, run#5 - all same as run #1

    run #6, successful

    run #7

    It now appears that the device itself has crashed. I've run dozens of times and always same as run#7 results. How can the device be crashing???? Touching the device it is cold to the touch, which so far is the best method I've found to check if the device is online or not.

    Is this normal behavior, for the firmware on the device to crash? What is the best way to tell if the device is running or not? Is there a status light or something I could turn on?

    After power cycling the PoE/device, I started getting successful runs again. Then this. So if you look back at the code, I got through the line auto device = dai::Device(pipeline, deviceInfo);, but then an exception was thrown in midst of device.getConnectedCameraFeatures(). LOL. When I say random errors this is what I mean. This is extremely simple code, it's quite literally your sample code. Am I having a normal experience here? Is there something physically wrong with this camera?

    Oop, here we go again. This time I made it to the line cout << "Device name: " << device.getDeviceName() << endl; where it threw on device.getDeviceName().

    🫠☠️🫠☠️🫠☠️

    I updated to latest repo, v2.21.2, and don't really see much of a difference. This is it?

    • erik replied to this.

      Hi jt512 ,

      Please also try the latest develop version of depthai library - relevant github issue here.

      Thanks, Erik

      The branches main, v2.21.2, and develop are all at the same commit…

      I see, it was just released 2 hours ago. The main changes are in the firmware, see commit here.

        I am also experiencing deadlocks on dai::Device::close() quite frequently. Referring to the code above, after getting a single frame, I call device.close(), break out of the loop and return the single frame. Deadlock on device.close();

        erik You're saying there is a firmware update paired with this? This is new to me so I need some explanation to understand what this means. Where are the side binaries kept?

        • erik replied to this.

          Hi jt512 ,

          Best to report the deadlock issue on depthai-core with minimal repro example. For firmware - for C++ I believe it gets downloaded at build time from our servers, while for python it's bundled together into a wheel. Firmware is closed source (our core IP).

          Thanks, Erik

          I'm gonna send an email.

          FYI, I can confirm the 2.21.2 update does nothing. Same exact sporadic connectivity issues.

          • erik replied to this.

            Hi jt512 ,

            Please open an issue on depthai-core with specifications of the sporadic connectivity issues so firmware engineers can repro/debug easily.

            Thanks, Erik