Oak camera with Mac M2 / M1 depth sensing for science Exhibit

dhunterrr

Hello,

I am making a science exhibit that uses a depth camera to detect what is on a table top and run a simulation using that depth information, not dissimilar to AR Sandbox.

Up until now for the prototype we have been using an old macbook pro (2017 intel model) and a kinect v2, and the simulation runs in Processing 4 software. However both these bits of hardware are old and this exhibit needs to last years. We would like to upgrade the hardware. Both the kinect and an asus xtion we tried do not work with Macbook pro M1.

We are looking to run the exhibit on a new Mac Mini with M2 chip.

So, I need a depth camera that is going to work with a Mac M1/M2 chip. The camera should provide a depth image, that we can pull into either Processing 4 or into a webpage, and then use the depth image to influence the simulation.

Does the Oak camera work fine with M2/M1 chips? Can the depth image be directly accessed in Processing 4 or in a webpage through javascript like any normal usb webcam? Does the Oak camera have any web dependencies? This exhibit will likely run without an internet connection.

Many thanks for your technical advice,

David

erik

Hi dhunterrr ,
Yes, OAKs work well (as expected) on M1/M2 chips - I currently use M2 Pro Macbook. You can communicate with OAK via python or c++, and you could create a simple webserver that then communicates with your website (JS). Note that other software (such as opencv, numpy, etc.) might not work as well, and could cause problems - at least they had for me. Thoughts?
Thanks, Erik

dhunterrr

Thanks @erik, that is really helpful to confirm it works on those machines.

I'm not experienced with Python but if it can easily provide the depth image to another application/webpage I make then I'll be very happy!

dhunterrr

@erik OK I have received the OAK-D LITE Camera and have it displaying the depth image very nicely from the python script.

How do I send that to a local webpage? I'm a bit of a novice at Python, and after searching the forums I'm not finding something I can easily use.

Thanks,

David

dhunterrr

I have written a simple server in python that streams the webcam, but I cannot integrate my python code that displays the depth image from the command line.

Here is the python code that serves the webcam:

from flask import Flask, render_template, Response

import cv2

app = Flask(__name__)

camera = cv2.VideoCapture(0) # use 0 for web camera

# for cctv camera use rtsp://username:password@ip_address:554/user=username_password='password'_channel=channel_number_stream=0.sdp' instead of camera

# for local webcam use cv2.VideoCapture(0)

def gen_frames(): # generate frame by frame from camera

while True:

# Capture frame-by-frame

success, frame = camera.read() # read the camera frame

if not success:

break

else:

ret, buffer = cv2.imencode('.jpg', frame)

frame = buffer.tobytes()

yield (b'--frame\r\n'

b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n') # concat frame one by one and show result

@app.route('/video_feed')

def video_feed():

#Video streaming route. Put this in the src attribute of an img tag

return Response(gen_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')

@app.route('/')

def index():

"""Video streaming home page."""

return render_template('index.html')

if __name__ == '__main__':

app.run(debug=True)

And here is the python code to access the depth image:

#!/usr/bin/env python3

import cv2

import depthai as dai

import numpy as np

# Closer-in minimum depth, disparity range is doubled (from 95 to 190):

extended_disparity = False

# Better accuracy for longer distance, fractional disparity 32-levels:

subpixel = False

# Better handling for occlusions:

lr_check = True

# Create pipeline

pipeline = dai.Pipeline()

# Define sources and outputs

monoLeft = pipeline.create(dai.node.MonoCamera)

monoRight = pipeline.create(dai.node.MonoCamera)

depth = pipeline.create(dai.node.StereoDepth)

xout = pipeline.create(dai.node.XLinkOut)

xout.setStreamName("disparity")

# Properties

monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoLeft.setCamera("left")

monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoRight.setCamera("right")

# Create a node that will produce the depth map (using disparity output as it's easier to visualize depth this way)

depth.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)

# Options: MEDIAN_OFF, KERNEL_3x3, KERNEL_5x5, KERNEL_7x7 (default)

depth.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)

depth.setLeftRightCheck(lr_check)

depth.setExtendedDisparity(extended_disparity)

depth.setSubpixel(subpixel)

config = depth.initialConfig.get()

config.postProcessing.speckleFilter.enable = False

config.postProcessing.speckleFilter.speckleRange = 50

config.postProcessing.temporalFilter.enable = True

config.postProcessing.spatialFilter.enable = True

config.postProcessing.spatialFilter.holeFillingRadius = 2

config.postProcessing.spatialFilter.numIterations = 1

config.postProcessing.thresholdFilter.minRange = 400

config.postProcessing.thresholdFilter.maxRange = 15000

config.postProcessing.decimationFilter.decimationFactor = 1

depth.initialConfig.set(config)

# Linking

monoLeft.out.link(depth.left)

monoRight.out.link(depth.right)

depth.disparity.link(xout.input)

# Connect to device and start pipeline

with dai.Device(pipeline) as device:

# Output queue will be used to get the disparity frames from the outputs defined above

q = device.getOutputQueue(name="disparity", maxSize=4, blocking=False)

while True:

inDisparity = q.get() # blocking call, will wait until a new data has arrived

frame = inDisparity.getFrame()

# Normalization for better visualization

frame = (frame * (255 / depth.initialConfig.getMaxDisparity())).astype(np.uint8)

cv2.imshow("disparity", frame)

# Available color maps: https://docs.opencv.org/3.4/d3/d50/group__imgproc__colormap.html

###frame = cv2.applyColorMap(frame, cv2.COLORMAP_BONE)

###cv2.imshow("disparity_color", frame)

if cv2.waitKey(1) == ord('q'):

break

jakaskerl

Hi dhunterrr
This is because the OAK device doesn't function as a webcam by default. We do have UVC mode though (https://docs.luxonis.com/en/latest/pages/oak_webcam/). Alternatively you can use something like https://github.com/luxonis/depthai-experiments/tree/master/gen2-mjpeg-streaming to serve frames through HTTP/TCP.

Thanks,
Jaka

dhunterrr

Thank you @jakaskerl I have the mjpeg-streaming code working!

Now I just need to get rid of all the rbg color camera and mobile net object detection and only send the depth image…

dhunterrr

Hello @jakaskerl and @erik I have been using the mjpeg-streaming code and it is working well. However, it normalises the image that it sends so regardless of the distance from camera to object it makes the nearest object dark and the further images light. I need the colors in the image to represent the true depth.

What parameter do I change so the depth is coloured by depth and not normalised?

I tried commenting the below lines out in different combinations but it either gives a pretty much black output image or pulls an error

depthFrame = cv2.normalize(depthFrame, None, 255, 0, cv2.NORM_INF, cv2.CV_8UC1)

depthFrame = cv2.equalizeHist(depthFrame)

depthFrame = cv2.applyColorMap(depthFrame, cv2.COLORMAP_BONE)

The error I get is:

error: (-5:Bad argument) cv::ColorMap only supports source images of type CV_8UC1 or CV_8UC3 in function 'operator()'

The previous script I used for depth (but without streaming) does not seem to have this issue but the code is quite different so it is difficult for a non-expert python person to translate across. Pictures showing the normalisation in the original depth code (in black to white) which is fine and does not seem to normalise the colours, and mjpeg stream script which has the issue (in bone colour scheme blue to white)

original depth_test image script

camera is further away, human becomes darker, seems correct.

depth mjpeg script:

This is closer so should be darker (using bone color scheme), but it isn't, seems incorrect.

I am not allowed to upload the python scripts so I will post below…

Any help is greatly appreciated!

dhunterrr

depth_test.py script:

#!/usr/bin/env python3

import cv2

import depthai as dai

import numpy as np

# Closer-in minimum depth, disparity range is doubled (from 95 to 190):

extended_disparity = False

# Better accuracy for longer distance, fractional disparity 32-levels:

subpixel = False

# Better handling for occlusions:

lr_check = True

# Create pipeline

pipeline = dai.Pipeline()

# Define sources and outputs

monoLeft = pipeline.create(dai.node.MonoCamera)

monoRight = pipeline.create(dai.node.MonoCamera)

depth = pipeline.create(dai.node.StereoDepth)

xout = pipeline.create(dai.node.XLinkOut)

xout.setStreamName("disparity")

# Properties

monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoLeft.setCamera("left")

monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoRight.setCamera("right")

# Create a node that will produce the depth map (using disparity output as it's easier to visualize depth this way)

depth.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)

# Options: MEDIAN_OFF, KERNEL_3x3, KERNEL_5x5, KERNEL_7x7 (default)

depth.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)

depth.setLeftRightCheck(lr_check)

depth.setExtendedDisparity(extended_disparity)

depth.setSubpixel(subpixel)

config = depth.initialConfig.get()

config.postProcessing.speckleFilter.enable = False

config.postProcessing.speckleFilter.speckleRange = 50

config.postProcessing.temporalFilter.enable = True

config.postProcessing.spatialFilter.enable = True

config.postProcessing.spatialFilter.holeFillingRadius = 2

config.postProcessing.spatialFilter.numIterations = 1

config.postProcessing.thresholdFilter.minRange = 400

config.postProcessing.thresholdFilter.maxRange = 15000

config.postProcessing.decimationFilter.decimationFactor = 1

depth.initialConfig.set(config)

# Linking

monoLeft.out.link(depth.left)

monoRight.out.link(depth.right)

depth.disparity.link(xout.input)

# Connect to device and start pipeline

with dai.Device(pipeline) as device:

# Output queue will be used to get the disparity frames from the outputs defined above

q = device.getOutputQueue(name="disparity", maxSize=4, blocking=False)

while True:

inDisparity = q.get() # blocking call, will wait until a new data has arrived

frame = inDisparity.getFrame()

# Normalization for better visualization

frame = (frame * (255 / depth.initialConfig.getMaxDisparity())).astype(np.uint8)

cv2.imshow("disparity", frame)

# Available color maps: https://docs.opencv.org/3.4/d3/d50/group__imgproc__colormap.html

###frame = cv2.applyColorMap(frame, cv2.COLORMAP_BONE)

###cv2.imshow("disparity_color", frame)

if cv2.waitKey(1) == ord('q'):

break

dhunterrr

modified mjpeg streaming script:

import socketserver

import threading

import time

from http.server import BaseHTTPRequestHandler, HTTPServer

from io import BytesIO

from socketserver import ThreadingMixIn

from time import sleep

import depthai as dai

import numpy as np

import cv2

from PIL import Image

import blobconverter

HTTP_SERVER_PORT = 8090

class TCPServerRequest(socketserver.BaseRequestHandler):

def handle(self):

# Handle is called each time a client is connected

# When OpenDataCam connects, do not return - instead keep the connection open and keep streaming data

# First send HTTP header

header = 'HTTP/1.0 200 OK\r\nServer: Mozarella/2.2\r\nAccept-Range: bytes\r\nConnection: close\r\nMax-Age: 0\r\nExpires: 0\r\nCache-Control: no-cache, private\r\nPragma: no-cache\r\nContent-Type: application/json\r\n\r\n'

self.request.send(header.encode())

while True:

sleep(0.1)

if hasattr(self.server, 'datatosend'):

self.request.send(self.server.datatosend.encode() + "\r\n".encode())

# HTTPServer MJPEG

class VideoStreamHandler(BaseHTTPRequestHandler):

def do_GET(self):

self.send_response(200)

self.send_header('Content-type', 'multipart/x-mixed-replace; boundary=--jpgboundary')

self.send_header('Access-Control-Allow-Origin', '*')

self.end_headers()

while True:

sleep(0.1)

if hasattr(self.server, 'frametosend'):

ok, encoded = cv2.imencode('.jpg', self.server.frametosend)

self.wfile.write("--jpgboundary".encode())

self.send_header('Content-type', 'image/jpeg')

self.send_header('Content-length', str(len(encoded)))

self.end_headers()

self.wfile.write(encoded)

self.end_headers()

class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):

"""Handle requests in a separate thread."""

pass

# start TCP data server

server_TCP = socketserver.TCPServer(('localhost', 8070), TCPServerRequest)

th = threading.Thread(target=server_TCP.serve_forever)

th.daemon = True

th.start()

# start MJPEG HTTP Server

server_HTTP = ThreadedHTTPServer(('localhost', HTTP_SERVER_PORT), VideoStreamHandler)

th2 = threading.Thread(target=server_HTTP.serve_forever)

th2.daemon = True

th2.start()

# MobilenetSSD label texts

labelMap = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow",

"diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

syncNN = True

def create_pipeline(depth):

# Start defining a pipeline

pipeline = dai.Pipeline()

# Define a source - color camera

colorCam = pipeline.create(dai.node.ColorCamera)

if depth:

mobilenet = pipeline.create(dai.node.MobileNetSpatialDetectionNetwork)

monoLeft = pipeline.create(dai.node.MonoCamera)

monoRight = pipeline.create(dai.node.MonoCamera)

stereo = pipeline.create(dai.node.StereoDepth)

else:

mobilenet = pipeline.create(dai.node.MobileNetDetectionNetwork)

colorCam.setPreviewSize(300, 300)

colorCam.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)

colorCam.setInterleaved(False)

colorCam.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)

mobilenet.setBlobPath(blobconverter.from_zoo("mobilenet-ssd", shaves=6))

mobilenet.setConfidenceThreshold(0.5)

mobilenet.input.setBlocking(False)

if depth:

monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)

monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)

monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)

# Setting node configs

stereo.initialConfig.setConfidenceThreshold(255)

stereo.depth.link(mobilenet.inputDepth)

stereo.setDepthAlign(dai.CameraBoardSocket.RGB)

mobilenet.setBoundingBoxScaleFactor(0.5)

mobilenet.setDepthLowerThreshold(100)

mobilenet.setDepthUpperThreshold(5000)

monoLeft.out.link(stereo.left)

monoRight.out.link(stereo.right)

xoutDepth = pipeline.create(dai.node.XLinkOut)

xoutDepth.setStreamName("depth")

mobilenet.passthroughDepth.link(xoutDepth.input)

xoutRgb = pipeline.create(dai.node.XLinkOut)

xoutRgb.setStreamName("rgb")

colorCam.preview.link(mobilenet.input)

if syncNN:

mobilenet.passthrough.link(xoutRgb.input)

else:

colorCam.preview.link(xoutRgb.input)

xoutNN = pipeline.create(dai.node.XLinkOut)

xoutNN.setStreamName("detections")

mobilenet.out.link(xoutNN.input)

return pipeline

# Pipeline is defined, now we can connect to the device

with dai.Device() as device:

cams = device.getConnectedCameras()

depth_enabled = dai.CameraBoardSocket.LEFT in cams and dai.CameraBoardSocket.RIGHT in cams

# Start pipeline

device.startPipeline(create_pipeline(depth_enabled))

print(f"DepthAI is up & running. Navigate to 'localhost:{str(HTTP_SERVER_PORT)}' with Chrome to see the mjpeg stream")

# Output queues will be used to get the rgb frames and nn data from the outputs defined above

previewQueue = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)

detectionNNQueue = device.getOutputQueue(name="detections", maxSize=4, blocking=False)

if depth_enabled:

depthQueue = device.getOutputQueue(name="depth", maxSize=4, blocking=False)

frame = None

depthFrame = None

detections = []

startTime = time.monotonic()

counter = 0

fps = 0

color = (255, 255, 255)

while True:

inPreview = previewQueue.get()

frame = inPreview.getCvFrame()

inNN = detectionNNQueue.get()

detections = inNN.detections

counter+=1

current_time = time.monotonic()

if (current_time - startTime) > 1 :

fps = counter / (current_time - startTime)

counter = 0

startTime = current_time

if depth_enabled:

depthFrame = depthQueue.get().getFrame()

depthFrame = cv2.normalize(depthFrame, None, 255, 0, cv2.NORM_INF, cv2.CV_8UC1)

depthFrame = cv2.equalizeHist(depthFrame)

depthFrame = cv2.applyColorMap(depthFrame, cv2.COLORMAP_BONE)

# If the frame is available, draw bounding boxes on it and show the frame

###height = frame.shape[0]

###width = frame.shape[1]

###for detection in detections:

# Denormalize bounding box

###x1 = int(detection.xmin * width)

###x2 = int(detection.xmax * width)

###y1 = int(detection.ymin * height)

###y2 = int(detection.ymax * height)

###try:

###label = labelMap[detection.label]

###except:

###label = detection.label

###cv2.putText(frame, str(label), (x1 + 10, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

###cv2.putText(frame, "{:.2f}".format(detection.confidence*100), (x1 + 10, y1 + 35), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

###if depth_enabled:

###cv2.putText(frame, f"X: {int(detection.spatialCoordinates.x)} mm", (x1 + 10, y1 + 50), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

###cv2.putText(frame, f"Y: {int(detection.spatialCoordinates.y)} mm", (x1 + 10, y1 + 65), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

###cv2.putText(frame, f"Z: {int(detection.spatialCoordinates.z)} mm", (x1 + 10, y1 + 80), cv2.FONT_HERSHEY_TRIPLEX, 0.5, color)

###cv2.rectangle(frame, (x1, y1), (x2, y2), color, cv2.FONT_HERSHEY_SIMPLEX)

###server_TCP.datatosend = str(label) + "," + f"{int(detection.confidence * 100)}%"

###if depthFrame is not None:

###roi = detection.boundingBoxMapping.roi

###roi = roi.denormalize(depthFrame.shape[1], depthFrame.shape[0])

###topLeft = roi.topLeft()

###bottomRight = roi.bottomRight()

###xmin = int(topLeft.x)

###ymin = int(topLeft.y)

###xmax = int(bottomRight.x)

###ymax = int(bottomRight.y)

###cv2.rectangle(depthFrame, (xmin, ymin), (xmax, ymax), color, cv2.FONT_HERSHEY_SCRIPT_SIMPLEX)

###cv2.putText(frame, "NN fps: {:.2f}".format(fps), (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color)

if depth_enabled:

###new_width = int(depthFrame.shape[1] * (frame.shape[0] / depthFrame.shape[0]))

###stacked = np.hstack([frame, cv2.resize(depthFrame, (new_width, frame.shape[0]))])

###cv2.imshow("stacked", stacked)

cv2.imshow("d frame", depthFrame)

###server_HTTP.frametosend = stacked

server_HTTP.frametosend = depthFrame

else:

cv2.imshow("frame", frame)

server_HTTP.frametosend = frame

if cv2.waitKey(1) == ord('q'):

break

erik

Hi dhunterrr ,
Please provide minimal repro scripts - this is not minimal.

dhunterrr

These are Luxonis own demo scripts. The only difference is I've commented out the RGB camera on the mjpeg streaming script.

I highlighted the three lines I believe are causing the issue in the mjpeg streaming script, and post them below:

depthFrame = cv2.normalize(depthFrame, None, 255, 0, cv2.NORM_INF, cv2.CV_8UC1)

depthFrame = cv2.equalizeHist(depthFrame)

depthFrame = cv2.applyColorMap(depthFrame, cv2.COLORMAP_BONE)

I will try to post minimal versions to a repo…

dhunterrr

I have swapped the above three lines for the two lines below and it appears to be working quite nicely:

depthFrame = np.interp(depthFrame, (400, 800), (0, 255)).astype(np.uint8)

depthFrame = cv2.applyColorMap(depthFrame, cv2.COLORMAP_BONE)