• Hardware
  • Oak camera with Mac M2 / M1 depth sensing for science Exhibit

Hello,

I am making a science exhibit that uses a depth camera to detect what is on a table top and run a simulation using that depth information, not dissimilar to AR Sandbox.

Up until now for the prototype we have been using an old macbook pro (2017 intel model) and a kinect v2, and the simulation runs in Processing 4 software. However both these bits of hardware are old and this exhibit needs to last years. We would like to upgrade the hardware. Both the kinect and an asus xtion we tried do not work with Macbook pro M1.

We are looking to run the exhibit on a new Mac Mini with M2 chip.

So, I need a depth camera that is going to work with a Mac M1/M2 chip. The camera should provide a depth image, that we can pull into either Processing 4 or into a webpage, and then use the depth image to influence the simulation.

Does the Oak camera work fine with M2/M1 chips? Can the depth image be directly accessed in Processing 4 or in a webpage through javascript like any normal usb webcam? Does the Oak camera have any web dependencies? This exhibit will likely run without an internet connection.

Many thanks for your technical advice,

David

  • erik replied to this.

    Hi dhunterrr ,
    Yes, OAKs work well (as expected) on M1/M2 chips - I currently use M2 Pro Macbook. You can communicate with OAK via python or c++, and you could create a simple webserver that then communicates with your website (JS). Note that other software (such as opencv, numpy, etc.) might not work as well, and could cause problems - at least they had for me. Thoughts?
    Thanks, Erik

    Thanks @erik, that is really helpful to confirm it works on those machines.

    I'm not experienced with Python but if it can easily provide the depth image to another application/webpage I make then I'll be very happy!

    22 days later

    @erik OK I have received the OAK-D LITE Camera and have it displaying the depth image very nicely from the python script.

    How do I send that to a local webpage? I'm a bit of a novice at Python, and after searching the forums I'm not finding something I can easily use.

    Thanks,

    David

    I have written a simple server in python that streams the webcam, but I cannot integrate my python code that displays the depth image from the command line.

    Here is the python code that serves the webcam:

    from flask import Flask, render_template, Response

    import cv2

    app = Flask(__name__)

    camera = cv2.VideoCapture(0) # use 0 for web camera

    # for cctv camera use rtsp://username:password@ip_address:554/user=username_password='password'_channel=channel_number_stream=0.sdp' instead of camera

    # for local webcam use cv2.VideoCapture(0)

    def gen_frames(): # generate frame by frame from camera

    while True:

    # Capture frame-by-frame

    success, frame = camera.read() # read the camera frame

    if not success:

    break

    else:

    ret, buffer = cv2.imencode('.jpg', frame)

    frame = buffer.tobytes()

    yield (b'--frame\r\n'

    b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n') # concat frame one by one and show result

    @app.route('/video_feed')

    def video_feed():

    #Video streaming route. Put this in the src attribute of an img tag

    return Response(gen_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')

    @app.route('/')

    def index():

    """Video streaming home page."""

    return render_template('index.html')

    if __name__ == '__main__':

    app.run(debug=True)

    .

    .

    .

    And here is the python code to access the depth image:

    #!/usr/bin/env python3

    import cv2

    import depthai as dai

    import numpy as np

    # Closer-in minimum depth, disparity range is doubled (from 95 to 190):

    extended_disparity = False

    # Better accuracy for longer distance, fractional disparity 32-levels:

    subpixel = False

    # Better handling for occlusions:

    lr_check = True

    # Create pipeline

    pipeline = dai.Pipeline()

    # Define sources and outputs

    monoLeft = pipeline.create(dai.node.MonoCamera)

    monoRight = pipeline.create(dai.node.MonoCamera)

    depth = pipeline.create(dai.node.StereoDepth)

    xout = pipeline.create(dai.node.XLinkOut)

    xout.setStreamName("disparity")

    # Properties

    monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

    monoLeft.setCamera("left")

    monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

    monoRight.setCamera("right")

    # Create a node that will produce the depth map (using disparity output as it's easier to visualize depth this way)

    depth.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)

    # Options: MEDIAN_OFF, KERNEL_3x3, KERNEL_5x5, KERNEL_7x7 (default)

    depth.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)

    depth.setLeftRightCheck(lr_check)

    depth.setExtendedDisparity(extended_disparity)

    depth.setSubpixel(subpixel)

    config = depth.initialConfig.get()

    config.postProcessing.speckleFilter.enable = False

    config.postProcessing.speckleFilter.speckleRange = 50

    config.postProcessing.temporalFilter.enable = True

    config.postProcessing.spatialFilter.enable = True

    config.postProcessing.spatialFilter.holeFillingRadius = 2

    config.postProcessing.spatialFilter.numIterations = 1

    config.postProcessing.thresholdFilter.minRange = 400

    config.postProcessing.thresholdFilter.maxRange = 15000

    config.postProcessing.decimationFilter.decimationFactor = 1

    depth.initialConfig.set(config)

    # Linking

    monoLeft.out.link(depth.left)

    monoRight.out.link(depth.right)

    depth.disparity.link(xout.input)

    # Connect to device and start pipeline

    with dai.Device(pipeline) as device:

    # Output queue will be used to get the disparity frames from the outputs defined above

    q = device.getOutputQueue(name="disparity", maxSize=4, blocking=False)

    while True:

    inDisparity = q.get() # blocking call, will wait until a new data has arrived

    frame = inDisparity.getFrame()

    # Normalization for better visualization

    frame = (frame * (255 / depth.initialConfig.getMaxDisparity())).astype(np.uint8)

    cv2.imshow("disparity", frame)

    # Available color maps: https://docs.opencv.org/3.4/d3/d50/group__imgproc__colormap.html

    ###frame = cv2.applyColorMap(frame, cv2.COLORMAP_BONE)

    ###cv2.imshow("disparity_color", frame)

    if cv2.waitKey(1) == ord('q'):

    break

      Thank you @jakaskerl I have the mjpeg-streaming code working!

      Now I just need to get rid of all the rbg color camera and mobile net object detection and only send the depth image…

      9 days later

      Hello @jakaskerl and @erik I have been using the mjpeg-streaming code and it is working well. However, it normalises the image that it sends so regardless of the distance from camera to object it makes the nearest object dark and the further images light. I need the colors in the image to represent the true depth.

      What parameter do I change so the depth is coloured by depth and not normalised?

      I tried commenting the below lines out in different combinations but it either gives a pretty much black output image or pulls an error

      depthFrame = cv2.normalize(depthFrame, None, 255, 0, cv2.NORM_INF, cv2.CV_8UC1)

      depthFrame = cv2.equalizeHist(depthFrame)

      depthFrame = cv2.applyColorMap(depthFrame, cv2.COLORMAP_BONE)

      The error I get is:

      error: (-5:Bad argument) cv::ColorMap only supports source images of type CV_8UC1 or CV_8UC3 in function 'operator()'

      The previous script I used for depth (but without streaming) does not seem to have this issue but the code is quite different so it is difficult for a non-expert python person to translate across. Pictures showing the normalisation in the original depth code (in black to white) which is fine and does not seem to normalise the colours, and mjpeg stream script which has the issue (in bone colour scheme blue to white)

      original depth_test image script

      camera is further away, human becomes darker, seems correct.

      depth mjpeg script:

      This is closer so should be darker (using bone color scheme), but it isn't, seems incorrect.

      I am not allowed to upload the python scripts so I will post below…

      Any help is greatly appreciated!

      depth_test.py script:

      #!/usr/bin/env python3

      import cv2

      import depthai as dai

      import numpy as np

      # Closer-in minimum depth, disparity range is doubled (from 95 to 190):

      extended_disparity = False

      # Better accuracy for longer distance, fractional disparity 32-levels:

      subpixel = False

      # Better handling for occlusions:

      lr_check = True

      # Create pipeline

      pipeline = dai.Pipeline()

      # Define sources and outputs

      monoLeft = pipeline.create(dai.node.MonoCamera)

      monoRight = pipeline.create(dai.node.MonoCamera)

      depth = pipeline.create(dai.node.StereoDepth)

      xout = pipeline.create(dai.node.XLinkOut)

      xout.setStreamName("disparity")

      # Properties

      monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

      monoLeft.setCamera("left")

      monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

      monoRight.setCamera("right")

      # Create a node that will produce the depth map (using disparity output as it's easier to visualize depth this way)

      depth.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)

      # Options: MEDIAN_OFF, KERNEL_3x3, KERNEL_5x5, KERNEL_7x7 (default)

      depth.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)

      depth.setLeftRightCheck(lr_check)

      depth.setExtendedDisparity(extended_disparity)

      depth.setSubpixel(subpixel)

      config = depth.initialConfig.get()

      config.postProcessing.speckleFilter.enable = False

      config.postProcessing.speckleFilter.speckleRange = 50

      config.postProcessing.temporalFilter.enable = True

      config.postProcessing.spatialFilter.enable = True

      config.postProcessing.spatialFilter.holeFillingRadius = 2

      config.postProcessing.spatialFilter.numIterations = 1

      config.postProcessing.thresholdFilter.minRange = 400

      config.postProcessing.thresholdFilter.maxRange = 15000

      config.postProcessing.decimationFilter.decimationFactor = 1

      depth.initialConfig.set(config)

      # Linking

      monoLeft.out.link(depth.left)

      monoRight.out.link(depth.right)

      depth.disparity.link(xout.input)

      # Connect to device and start pipeline

      with dai.Device(pipeline) as device:

      # Output queue will be used to get the disparity frames from the outputs defined above

      q = device.getOutputQueue(name="disparity", maxSize=4, blocking=False)

      while True:

      inDisparity = q.get() # blocking call, will wait until a new data has arrived

      frame = inDisparity.getFrame()

      # Normalization for better visualization

      frame = (frame * (255 / depth.initialConfig.getMaxDisparity())).astype(np.uint8)

      cv2.imshow("disparity", frame)

      # Available color maps: https://docs.opencv.org/3.4/d3/d50/group__imgproc__colormap.html

      ###frame = cv2.applyColorMap(frame, cv2.COLORMAP_BONE)

      ###cv2.imshow("disparity_color", frame)

      if cv2.waitKey(1) == ord('q'):

      break

      modified mjpeg streaming script:

      import socketserver

      import threading

      import time

      from http.server import BaseHTTPRequestHandler, HTTPServer

      from io import BytesIO

      from socketserver import ThreadingMixIn

      from time import sleep

      import depthai as dai

      import numpy as np

      import cv2

      from PIL import Image

      import blobconverter

      HTTP_SERVER_PORT = 8090

      class TCPServerRequest(socketserver.BaseRequestHandler):

      def handle(self):

      # Handle is called each time a client is connected

      # When OpenDataCam connects, do not return - instead keep the connection open and keep streaming data

      # First send HTTP header

      header = 'HTTP/1.0 200 OK\r\nServer: Mozarella/2.2\r\nAccept-Range: bytes\r\nConnection: close\r\nMax-Age: 0\r\nExpires: 0\r\nCache-Control: no-cache, private\r\nPragma: no-cache\r\nContent-Type: application/json\r\n\r\n'

      self.request.send(header.encode())

      while True:

      sleep(0.1)

      if hasattr(self.server, 'datatosend'):

      self.request.send(self.server.datatosend.encode() + "\r\n".encode())

      # HTTPServer MJPEG

      class VideoStreamHandler(BaseHTTPRequestHandler):

      def do_GET(self):

      self.send_response(200)

      self.send_header('Content-type', 'multipart/x-mixed-replace; boundary=--jpgboundary')

      self.send_header('Access-Control-Allow-Origin', '*')

      self.end_headers()

      while True:

      sleep(0.1)

      if hasattr(self.server, 'frametosend'):

      ok, encoded = cv2.imencode('.jpg', self.server.frametosend)

      self.wfile.write("--jpgboundary".encode())

      self.send_header('Content-type', 'image/jpeg')

      self.send_header('Content-length', str(len(encoded)))

      self.end_headers()

      self.wfile.write(encoded)

      self.end_headers()

      class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):

      """Handle requests in a separate thread."""

      pass

      # start TCP data server

      server_TCP = socketserver.TCPServer(('localhost', 8070), TCPServerRequest)

      th = threading.Thread(target=server_TCP.serve_forever)

      th.daemon = True

      th.start()

      # start MJPEG HTTP Server

      server_HTTP = ThreadedHTTPServer(('localhost', HTTP_SERVER_PORT), VideoStreamHandler)

      th2 = threading.Thread(target=server_HTTP.serve_forever)

      th2.daemon = True

      th2.start()

      # MobilenetSSD label texts

      labelMap = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow",

      "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

      syncNN = True

      def create_pipeline(depth):

      # Start defining a pipeline

      pipeline = dai.Pipeline()

      # Define a source - color camera

      colorCam = pipeline.create(dai.node.ColorCamera)

      if depth:

      mobilenet = pipeline.create(dai.node.MobileNetSpatialDetectionNetwork)

      monoLeft = pipeline.create(dai.node.MonoCamera)

      monoRight = pipeline.create(dai.node.MonoCamera)

      stereo = pipeline.create(dai.node.StereoDepth)

      else:

      mobilenet = pipeline.create(dai.node.MobileNetDetectionNetwork)

      colorCam.setPreviewSize(300, 300)

      colorCam.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)

      colorCam.setInterleaved(False)

      colorCam.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)

      mobilenet.setBlobPath(blobconverter.from_zoo("mobilenet-ssd", shaves=6))

      mobilenet.setConfidenceThreshold(0.5)

      mobilenet.input.setBlocking(False)

      if depth:

      monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

      monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)

      monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

      monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)

      # Setting node configs

      stereo.initialConfig.setConfidenceThreshold(255)

      stereo.depth.link(mobilenet.inputDepth)

      stereo.setDepthAlign(dai.CameraBoardSocket.RGB)

      mobilenet.setBoundingBoxScaleFactor(0.5)

      mobilenet.setDepthLowerThreshold(100)

      mobilenet.setDepthUpperThreshold(5000)

      monoLeft.out.link(stereo.left)

      monoRight.out.link(stereo.right)

      xoutDepth = pipeline.create(dai.node.XLinkOut)

      xoutDepth.setStreamName("depth")

      mobilenet.passthroughDepth.link(xoutDepth.input)

      xoutRgb = pipeline.create(dai.node.XLinkOut)

      xoutRgb.setStreamName("rgb")

      colorCam.preview.link(mobilenet.input)

      if syncNN:

      mobilenet.passthrough.link(xoutRgb.input)

      else:

      colorCam.preview.link(xoutRgb.input)

      xoutNN = pipeline.create(dai.node.XLinkOut)

      xoutNN.setStreamName("detections")

      mobilenet.out.link(xoutNN.input)

      return pipeline

      # Pipeline is defined, now we can connect to the device

      with dai.Device() as device:

      cams = device.getConnectedCameras()

      depth_enabled = dai.CameraBoardSocket.LEFT in cams and dai.CameraBoardSocket.RIGHT in cams

      # Start pipeline

      device.startPipeline(create_pipeline(depth_enabled))

      print(f"DepthAI is up & running. Navigate to 'localhost:{str(HTTP_SERVER_PORT)}' with Chrome to see the mjpeg stream")

      # Output queues will be used to get the rgb frames and nn data from the outputs defined above

      previewQueue = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)

      detectionNNQueue = device.getOutputQueue(name="detections", maxSize=4, blocking=False)

      if depth_enabled:

      depthQueue = device.getOutputQueue(name="depth", maxSize=4, blocking=False)

      frame = None

      depthFrame = None

      detections = []

      startTime = time.monotonic()

      counter = 0

      fps = 0

      color = (255, 255, 255)

      while True:

      inPreview = previewQueue.get()

      frame = inPreview.getCvFrame()

      inNN = detectionNNQueue.get()

      detections = inNN.detections

      counter+=1

      current_time = time.monotonic()

      if (current_time - startTime) > 1 :

      fps = counter / (current_time - startTime)

      counter = 0

      startTime = current_time

      if depth_enabled:

      depthFrame = depthQueue.get().getFrame()

      depthFrame = cv2.normalize(depthFrame, None, 255, 0, cv2.NORM_INF, cv2.CV_8UC1)

      depthFrame = cv2.equalizeHist(depthFrame)

      depthFrame = cv2.applyColorMap(depthFrame, cv2.COLORMAP_BONE)

      # If the frame is available, draw bounding boxes on it and show the frame

      ###height = frame.shape[0]

      ###width = frame.shape[1]

      ###for detection in detections:

      # Denormalize bounding box

      ###x1 = int(detection.xmin * width)

      ###x2 = int(detection.xmax * width)

      ###y1 = int(detection.ymin * height)

      ###y2 = int(detection.ymax * height)

      ###try:

      ###label = labelMap[detection.label]

      ###except:

      ###label = detection.label

      ###cv2.putText(frame, str(label), (x1 + 10, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

      ###cv2.putText(frame, "{:.2f}".format(detection.confidence*100), (x1 + 10, y1 + 35), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

      ###if depth_enabled:

      ###cv2.putText(frame, f"X: {int(detection.spatialCoordinates.x)} mm", (x1 + 10, y1 + 50), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

      ###cv2.putText(frame, f"Y: {int(detection.spatialCoordinates.y)} mm", (x1 + 10, y1 + 65), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

      ###cv2.putText(frame, f"Z: {int(detection.spatialCoordinates.z)} mm", (x1 + 10, y1 + 80), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

      ###cv2.rectangle(frame, (x1, y1), (x2, y2), color, cv2.FONT_HERSHEY_SIMPLEX)

      ###server_TCP.datatosend = str(label) + "," + f"{int(detection.confidence * 100)}%"

      ###if depthFrame is not None:

      ###roi = detection.boundingBoxMapping.roi

      ###roi = roi.denormalize(depthFrame.shape[1], depthFrame.shape[0])

      ###topLeft = roi.topLeft()

      ###bottomRight = roi.bottomRight()

      ###xmin = int(topLeft.x)

      ###ymin = int(topLeft.y)

      ###xmax = int(bottomRight.x)

      ###ymax = int(bottomRight.y)

      ###cv2.rectangle(depthFrame, (xmin, ymin), (xmax, ymax), color, cv2.FONT_HERSHEY_SCRIPT_SIMPLEX)

      ###cv2.putText(frame, "NN fps: {:.2f}".format(fps), (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color)

      if depth_enabled:

      ###new_width = int(depthFrame.shape[1] * (frame.shape[0] / depthFrame.shape[0]))

      ###stacked = np.hstack([frame, cv2.resize(depthFrame, (new_width, frame.shape[0]))])

      ###cv2.imshow("stacked", stacked)

      cv2.imshow("d frame", depthFrame)

      ###server_HTTP.frametosend = stacked

      server_HTTP.frametosend = depthFrame

      else:

      cv2.imshow("frame", frame)

      server_HTTP.frametosend = frame

      if cv2.waitKey(1) == ord('q'):

      break

      • erik replied to this.

        Hi dhunterrr ,
        Please provide minimal repro scripts - this is not minimal.

        These are Luxonis own demo scripts. The only difference is I've commented out the RGB camera on the mjpeg streaming script.

        I highlighted the three lines I believe are causing the issue in the mjpeg streaming script, and post them below:

        depthFrame = cv2.normalize(depthFrame, None, 255, 0, cv2.NORM_INF, cv2.CV_8UC1)

        depthFrame = cv2.equalizeHist(depthFrame)

        depthFrame = cv2.applyColorMap(depthFrame, cv2.COLORMAP_BONE)

        I will try to post minimal versions to a repo…

        I have swapped the above three lines for the two lines below and it appears to be working quite nicely:

        depthFrame = np.interp(depthFrame, (400, 800), (0, 255)).astype(np.uint8)

        depthFrame = cv2.applyColorMap(depthFrame, cv2.COLORMAP_BONE)