Hi everyone - I'm working on a vehicle that will steer autonomously in an agricultural setting. I want to use an oak-d-lite to determine where a furrow is (the path between the crops, ie where a tractor drives) in a field and use that information to drive in the center of the furrow. After looking at some existing ML model databases, I think that I'm going to have to develop my own model to do this. As I understand it, the steps to do this are - use an oak-d-lite to take thousands of photos of furrows - plug these photos into a ML framework like tensorflow to create a model - compile the model and run it on the oak-d-lite So my questions are - 1) Does all of the above seem correct? 2) Is there a preferred format for using depth and RGB data in a training set? Should I preprocess the data in some way? 3) Is there a preferred software stack for dealing with depth data or one that is especially easy to use for these kinds of use cases? I was going to use tensorflow just because it was the only one I had heard of before I starting this project. As you can probably tell, I'm brand new to the world of computer vision and AI so any help would be much appreciated. Thanks!

Hi @"benjs"#p5929 , 1. That's more or less correct. Note that you could also eg. use sythetic dataset generation (we use Unity for that) to develop the initial revision of the model, then later we would continuously improve the model with real-world data. 2. Not really, but you might not want to encode it with too much quality loss, as that could effect the accuracy when it's deployed though. 3. There aren't that many (popular) models that use both depth and RGB data. Some customers design their own models (eg. some version of yolo but with depth input as well), but most just use RGB data, and after inference results they combine depth information with it. One example would be to run semantic segmentation on the crops, then use that mask together with depth to conclude where crops/furrows are. Thoughts? Thanks, Erik

Creating a new machine learning model

benjs

Hi everyone - I'm working on a vehicle that will steer autonomously in an agricultural setting. I want to use an oak-d-lite to determine where a furrow is (the path between the crops, ie where a tractor drives) in a field and use that information to drive in the center of the furrow. After looking at some existing ML model databases, I think that I'm going to have to develop my own model to do this.

As I understand it, the steps to do this are

use an oak-d-lite to take thousands of photos of furrows
plug these photos into a ML framework like tensorflow to create a model
compile the model and run it on the oak-d-lite

So my questions are -
1) Does all of the above seem correct?
2) Is there a preferred format for using depth and RGB data in a training set? Should I preprocess the data in some way?
3) Is there a preferred software stack for dealing with depth data or one that is especially easy to use for these kinds of use cases? I was going to use tensorflow just because it was the only one I had heard of before I starting this project.

As you can probably tell, I'm brand new to the world of computer vision and AI so any help would be much appreciated. Thanks!

erik

Hi benjs ,

That's more or less correct. Note that you could also eg. use sythetic dataset generation (we use Unity for that) to develop the initial revision of the model, then later we would continuously improve the model with real-world data.
Not really, but you might not want to encode it with too much quality loss, as that could effect the accuracy when it's deployed though.
There aren't that many (popular) models that use both depth and RGB data. Some customers design their own models (eg. some version of yolo but with depth input as well), but most just use RGB data, and after inference results they combine depth information with it. One example would be to run semantic segmentation on the crops, then use that mask together with depth to conclude where crops/furrows are.

Thoughts?
Thanks, Erik

benjs

Thanks erik, that all makes sense. I'll try that two stage approach.