Hey all, I'm looking for some advice  - I’m delving into a new territory in my programming career and could really use some guidance from experts in the computer vision (CV) and deep learning sectors. While I have a solid foundation in Python, and C++ for embedded devices, I'm a novice in the realms of CV and deep learning, I’ve had a look at OpenCV but nothing in depth yet.

Project Overview:

I'm working on a CV system to be deployed on underwater drones for identifying and precisely localizing specific types of rare underwater flora (e.g., certain species of seashells, seagrass or coral formations). The conditions are variable, with potential low visibility and light. The end goal is not just identification but also detailed mapping, and potential collection of invasive species, which requires exact location coordinates for ecological research and monitoring purposes.

I have several questions regarding this endeavour:

Learning Curve: How challenging is it for someone with no CV background to jump into developing a system for object detection and localization, especially in underwater settings? Are there any resources or tools particularly beneficial for beginners in CV, specifically for Python users?

Real-time Analysis: The drone would be in constant motion, necessitating real-time data processing. What are the main hurdles in integrating real-time image processing and object detection under these conditions?  Of course we can slow the drone down when it detects an area of interest to obtain more data.

Approach: Would a deep learning methodology significantly outperform traditional CV techniques in distinguishing specific underwater seashell species or coral types? How steep is the learning curve for deep learning in this specific application?  We would be deploying to survey one species at a time – so we could deploy a model that was trained to only recognise the specific target species, we don’t need a generally trained model.

Environmental Challenges: The underwater environment might be murky, with other sea life possibly obstructing the view. How do these factors complicate the image processing and object detection tasks? Are there known strategies or standard practices to mitigate these issues?

Computational Constraints: Given the hardware limitations that come with the drones we use (small ROVs modified to hold a payload, total ROV weight 20kg – something like https://bluerobotics.com/store/rov/bluerov2/ ) , what are the common practices for data processing? Is it practical to process data on the drone in real-time, or should it be transmitted to a more powerful external system for analysis?  Ideally we would be able to allow these drones to operate and detect in real time, autonomously and untethered – hence we have a fairly limited power/space requirements (but it’s not tiny – could probably spare 100W max and a moderate volume).

Precision in Localization: The project requires pinpointing the exact location of identified species for mapping purposes. What complexities come into play when combining CV with drone telemetry to achieve precise 3D localization underwater?

Development requirements: Given the above, how feasible is this project at this time in your opinion (I know this is a "how long is a piece of string" question!)? Any specific HW/SW recommendations?

Your thoughts are greatly appreciated, thanks!

    Hi ClaireMulvey

    ClaireMulvey How challenging is it for someone with no CV background to jump into developing a system for object detection and localization, especially in underwater settings? Are there any resources or tools particularly beneficial for beginners in CV, specifically for Python users?

    Not challenging if you know at least a little bit about image analysis. You can check a few youtube videos or web tutorials and you should be fine. The majority is all the same, just tweaked a little bit differently to best suit the use-case. We also offer custom training notebooks for NN models: luxonis/depthai-ml-training/tree/master/colab-notebooks.

    ClaireMulvey Approach: Would a deep learning methodology significantly outperform traditional CV techniques in distinguishing specific underwater seashell species or coral types? How steep is the learning curve for deep learning in this specific application?  We would be deploying to survey one species at a time – so we could deploy a model that was trained to only recognise the specific target species, we don’t need a generally trained model.

    It takes much more time than standard CV techniques, but is more robust in the end. If you have a species that is visually an outlier (some color you wouldn't find anywhere else) go with the standard. If the species are hard to distinguish, standard techniques will give you a very poor score (many false positives).

    ClaireMulvey How do these factors complicate the image processing and object detection tasks? Are there known strategies or standard practices to mitigate these issues?

    Usually, these models will work fine even with obstructions. You could of course train the model in those conditions to improve it. You can also raise the confidence threshold so if the model is unsure, it won't give you false positives.

    ClaireMulvey Is it practical to process data on the drone in real-time, or should it be transmitted to a more powerful external system for analysis?

    OAK devices are designed to have models run in real time. You can also transmit the data if you feel like a heavier model needs to be used.

    ClaireMulvey The project requires pinpointing the exact location of identified species for mapping purposes. What complexities come into play when combining CV with drone telemetry to achieve precise 3D localization underwater?

    That is a difficult task underwater since the water distorts the image. We have SpatialCalculator node in the API that is used to calculate the positions of detected objects, but you might need some postprocessing to undistort the measurements.

    ClaireMulvey Given the above, how feasible is this project at this time in your opinion (I know this is a "how long is a piece of string" question!)? Any specific HW/SW recommendations?

    Depends on how much time you are willing to invest into model training and tweaking. As for HW - go with a POE device if you wish to transmit data to outside machine. You can also use this guide to help you decide: https://shop.luxonis.com/collections/product-guide

    Adding our main AI engineer @Matija to correct me and provide additional information.

    Thanks,
    Jaka

    12 days later

    ClaireMulvey

    Sorry for a late reply on this.

    Learning Curve

    As Jaka mentioned, there are a lot of resources online nowadays. The repository he linked is a good start. What online courses usually don't teach is how much compute different models require. There's usually a tradeoff between a heavier model which is more accurate but requires more compute, and a more lightweight model which is typically better for very specific tasks and can run on embedded devices. In our repository we focus on the latter, so that you can efficiently run them on our devices. I would recommend checking out our YoloV6 notebook.

    Real-time Analysis

    This comes down to the amount of post-processing that you want to do. If localization and identification is enough, then this should be totally fine to do with our cameras. As you mentioned, slowing down might not be a bad idea to be more accurate.

    Approach

    Yes, deep learning should outperform traditional CV techniques. As Jaka said, the disadvantage is it might require more time. But I believe if you were to do something similar with standard CV techniques, adjusting the thresholds and fine-tuning your approach could take quite some time as well. With deep learning, this should be easier to do, but keep in mind to have a representative dataset when you will be training the models.

    Environmental Challenges

    This comes down to how representative your dataset is. In other words, if you will train it on perfect images that are unlikely to appear underwater, model might struggle detecting them. However, if your dataset consists of challenging underwater images, models should be learn to recognize the objects. Note that the more challenging the environment, the more data and training might be required.

    Computational Constraints

    Typically our customers prefer to do real-time processing on the drone itself, while data is later transmitted to cloud for further analysis or actions. Benefit of that is that you don't need to transmit a lot of data, your cloud costs will be lower, and you can use the processed results to make real-time decisions. The compute that you need depends on the task itself. Our RVC2 cameras can run neural networks as well as depth computation and some otehr operations directly on them. However, they still need some external system to execute the script and do the final post-processing of results. This means that you need some external CPU, which can be as simple as some Raspberry PI to a full computer. We have clients using all sorts of CPUs and the choice depends on the processing that they need to execute real time. Our RVC3 (Rae) and upcoming RVC4 (to be released next year) do have embedded CPU so this dependency is not required anymore.

    Precision in Localization

    This comes down to the environment itself and the requirements. If it's precise localization in 3D space it might be more challenging as you need to track the location of the drone itself as well. If it's with respect to camera it will depend on the depth that you can achieve underwater. To get the best quality, you might have to re-calibrate it yourself, but we do provide the scripts and instructions for that in our docs.

    Development requirements

    I think it's totally feasible. I would approach this by splitting it into smaller steps first. Try to determine what you want to do first. In terms of application layer, I would recommend using YoloV6 for object detection as it runs efficiently on our devices, you could try using a Raspberry PI for the CPU part if you want to keep the power down.

    Feel free to ask any additional questions if something is unclear. We also offer paid priority support where we can help with the development of specific parts if this is something your team would be interested in.