[depthai-core c++] face detection tracking and age gender recognition

christiangda · May 25, 2024

Hi everyone,

I'm trying to implement a merge between different examples in python3(depthai-sdk) but implemented in c++ (depthai-core):

The references I'm using are:

The problem I have is with the age-gender recognition, the face detection and tracker are working but the the other doesn't work at all

Important lines of my code I don't understand why these are falling!

    auto ageGenderIn = ageGender->get<dai::NNData>(); // <- I understand this need to be used to get the data

    // auto ageProb = ageGenderIn->getLayerFp16("age_conv3"); // <- this crashes the program
    // auto genderProb = ageGenderIn->getLayerFp16("prob");   // <- this crashes the program
    BOOST_LOG_TRIVIAL(debug) << ageGenderIn << " layers received"; // <- This shows '0x0 layers received'

My code:


// let's check https://github.com/luxonis/depthai-experiments/blob/master/gen2-license-plate-recognition/main.py
// https://github.com/luxonis/depthai-experiments/blob/master/gen2-age-gender/api/main_api.py
int faceAgeGenderDetectorPipeline(std::string faceDetectionModel, std::string ageGenderModel, float confidenceThreshold, bool fullFrameTracking)
{
  dai::Pipeline pipeline;
  pipeline.setOpenVINOVersion(dai::OpenVINO::VERSION_2022_1);

  // Define sources and outputs
  // ColorCamera
  auto camRgb = pipeline.create<dai::node::ColorCamera>();
  camRgb->setPreviewSize(300, 300);
  camRgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_1080_P);
  camRgb->setInterleaved(false);
  camRgb->setColorOrder(dai::ColorCameraProperties::ColorOrder::BGR);
  camRgb->setBoardSocket(dai::CameraBoardSocket::AUTO);
  camRgb->setFps(20);

  // Face NeuralNetwork
  auto faceDetNet = pipeline.create<dai::node::MobileNetDetectionNetwork>();
  faceDetNet->setConfidenceThreshold(confidenceThreshold);
  faceDetNet->setBlobPath(faceDetectionModel);
  faceDetNet->input.setQueueSize(1);
  faceDetNet->input.setBlocking(false);
  faceDetNet->setNumInferenceThreads(2);

  // AgeGender NeuralNetwork
  auto ageGenderNet = pipeline.create<dai::node::MobileNetDetectionNetwork>();
  ageGenderNet->setConfidenceThreshold(confidenceThreshold);
  ageGenderNet->setBlobPath(ageGenderModel);
  ageGenderNet->input.setBlocking(false);
  ageGenderNet->input.setQueueSize(1);
  ageGenderNet->setNumInferenceThreads(2);

  // Object tracker
  auto objectTracker = pipeline.create<dai::node::ObjectTracker>();
  objectTracker->setDetectionLabelsToTrack({1}); // track faces
  objectTracker->setTrackerType(dai::TrackerType::ZERO_TERM_COLOR_HISTOGRAM);
  objectTracker->setTrackerIdAssignmentPolicy(dai::TrackerIdAssignmentPolicy::SMALLEST_ID);

  // Manip to reduce the faceDetNet output image from 300x300 to 62x62 for ageGenderNet
  auto imgManipInputAgeGenderNet = pipeline.create<dai::node::ImageManip>();
  imgManipInputAgeGenderNet->initialConfig.setResize(62, 62);
  imgManipInputAgeGenderNet->initialConfig.setFrameType(dai::ImgFrame::Type::BGR888p);

  // Outputs
  auto camOut = pipeline.create<dai::node::XLinkOut>();
  std::string STREAM_PREVIEW = "preview";
  camOut->setStreamName(STREAM_PREVIEW);

  auto trackerOut = pipeline.create<dai::node::XLinkOut>();
  std::string STREAM_TRACKLETS = "tracklets";
  trackerOut->setStreamName(STREAM_TRACKLETS);

  auto ageGenderOut = pipeline.create<dai::node::XLinkOut>();
  std::string STREAM_AGE_GENDER = "ageGender";
  ageGenderOut->setStreamName(STREAM_AGE_GENDER);

  // Linking
  camRgb->preview.link(faceDetNet->input);

  faceDetNet->passthrough.link(objectTracker->inputDetectionFrame);
  faceDetNet->out.link(objectTracker->inputDetections);
  faceDetNet->passthrough.link(imgManipInputAgeGenderNet->inputImage);

  imgManipInputAgeGenderNet->out.link(ageGenderNet->input);

  ageGenderNet->out.link(ageGenderOut->input);

  objectTracker->passthroughTrackerFrame.link(camOut->input);
  objectTracker->out.link(trackerOut->input);

  if (fullFrameTracking)
  {
    camRgb->video.link(objectTracker->inputTrackerFrame);
  }
  else
  {
    faceDetNet->passthrough.link(objectTracker->inputTrackerFrame);
  }

  // Connect to device and start pipeline
  dai::Device device(pipeline);

  // getting the output queues
  auto preview = device.getOutputQueue(STREAM_PREVIEW, 1, false);
  auto tracklets = device.getOutputQueue(STREAM_TRACKLETS, 1, false);
  auto ageGender = device.getOutputQueue(STREAM_AGE_GENDER, 1, false);

  auto startTime = steady_clock::now();
  int counter = 0;
  float fps = 0;
  auto color = cv::Scalar(0, 255, 0);

  while (true)
  {

    // get from host queue
    auto imgFrame = preview->get<dai::ImgFrame>();
    auto track = tracklets->get<dai::Tracklets>();
    auto ageGenderIn = ageGender->get<dai::NNData>(); // <- I understand this need to be used to get the data

    // auto ageProb = ageGenderIn->getLayerFp16("age_conv3"); // <- this crashes the program
    // auto genderProb = ageGenderIn->getLayerFp16("prob");   // <- this crashes the program
    BOOST_LOG_TRIVIAL(debug) << ageGenderIn << " layers received"; // <- This shows '0x0 layers received' 

    // calculate fps
    counter++;
    auto currentTime = steady_clock::now();
    auto elapsed = duration_cast<duration<float>>(currentTime - startTime);
    if (elapsed > seconds(1))
    {
      fps = counter / elapsed.count();
      counter = 0;
      startTime = currentTime;
    }

    // get the frame from the host queue
    cv::Mat frame = imgFrame->getCvFrame();
    auto trackletsData = track->tracklets;
    auto recognitions = ageGenderIn;

    for (auto &t : trackletsData)
    {
      auto roi = t.roi.denormalize(frame.cols, frame.rows);
      int x1 = roi.topLeft().x;
      int y1 = roi.topLeft().y;
      int x2 = roi.bottomRight().x;
      int y2 = roi.bottomRight().y;

      uint32_t labelIndex = t.label;
      std::string labelStr = to_string(labelIndex);
      if (labelIndex < labelMap.size())
      {
        labelStr = labelMap[labelIndex];
      }

      std::stringstream idStr;
      idStr << "ID: " << t.id;

      std::stringstream statusStr;
      statusStr << "Status: " << t.status;

      cv::putText(frame, labelStr, cv::Point(x1 + 10, y1 + 20), cv::FONT_HERSHEY_TRIPLEX, 0.7, color);
      cv::putText(frame, idStr.str(), cv::Point(x1 + 10, y1 + 40), cv::FONT_HERSHEY_TRIPLEX, 0.7, color);
      cv::putText(frame, statusStr.str(), cv::Point(x1 + 10, y1 + 60), cv::FONT_HERSHEY_TRIPLEX, 0.7, color);
      cv::rectangle(frame, cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2)), color, cv::FONT_HERSHEY_SIMPLEX);
    }

    std::stringstream fpsStr;
    fpsStr << "NN fps:" << std::fixed << std::setprecision(2) << fps;
    cv::putText(frame, fpsStr.str(), cv::Point(2, imgFrame->getHeight() - 4), cv::FONT_HERSHEY_TRIPLEX, 0.7, color);
    cv::imshow("tracker", frame);

    int key = cv::waitKey(1);
    if (key == 'q' || key == 'Q')
    {
      return EXIT_SUCCESS;
    }
  }

  return EXIT_SUCCESS;
}

Can someone give me a clue as to what could be happening? Thanks in advance!

jakaskerl · May 27, 2024

Hi @christiangda
You should be getting errors that something is wrong with the model input.

You need a script node to handle custom cropping from face detection to ageGender. A standard manip node doesn't know where the face is located, it only takes the resized portion (center) on the screen and passes it onwards. Hence no detections.

Can you please recheck?

Thanks,
Jaka

christiangda · May 27, 2024

jakaskerl Thank you very much for your quick answer.

You should be getting errors that something is wrong with the model input.

No, I'm not. Everything is working fine, but there is no recognition of Age/Gender. From your answer I got the problem I have, thank you. So, the reason why this is not generating an error is because

NN(faceDetNet) --> manip(imgManipInputAgeGenderNet) --> NN (ageGenderNet)

                            ^

             This generates a valid image (62,62)

       of all the scenes, but because of the resolution 

                   no faces are recognized.

So, Now I understand what the manip Python script does (luxonis/depthai-experimentsblob/8f193848bfebfc8e142260e700f0dfbd10d17fe5/gen2-age-gender/api/main_api.py#L86), which is: from the scene (image) crop all the detection of faces and pass this as input to the Age/gender recognition, right?