I WANT TO TRAIN CUSTOM OCR . CAN I GET HELP REGARDING THIS. I PREFER TO USE LOCAL SYSTEM FOR TRAINING.

Hi @"dhunjoshi"#p396 , So there are two parts to our OCR system: 1. Text detection. We used EAST, as it allows text to be oriented at odd angles/etc. I am pinging our engineer on this on if he did retraining. Anyway, [here](https://github.com/argman/EAST) is one example of retraining it in TensorFlow, which then should be compatible with our platform, as OpenVINO supports TensorFlow. 2. OCR. This takes the region found from the text detection, and runs the actual OCR on it. For this network, we use the OCR model from Intel: [https://docs.openvinotoolkit.org/2019_R1/_text_recognition_0012_description_text_recognition_0012.html](https://docs.openvinotoolkit.org/2019_R1/_text_recognition_0012_description_text_recognition_0012.html) I do not at this time know if Intel gives a reference on how to retrain that. For some of their networks, they do give retraining. Looking quickly. Not immediately seeing it. From the notes on it, "VGG16-like backbone and bidirectional LSTM encoder-decoder". So likely any network that is similar to this could be used instead, as long as it is on a similar backbone (or one of [these](https://docs.luxonis.com/en/latest/pages/faq/#what-network-backbones-are-supported-on-depthai)), or uses neural operations supported by OpenVINO for the VPU (OAK-D is the VPU in this context), see [here](https://docs.luxonis.com/en/latest/pages/faq/#if-i-train-my-own-network-which-neural-operations-are-supported-by-depthai).

[URL=https://i.imgur.com/Lx90Rdb.jpg][IMG]https://i.imgur.com/Lx90Rdb.jpg[/IMG][/URL] Can OCR will work in attached image. i have to retrain with new images?

i tried [EAST](https://github.com/argman/EAST) and result as shown. EAST was suggested by @"Brandon"#p405 [URL=https://i.imgur.com/e1u9Blp.png][IMG]https://i.imgur.com/e1u9Blp.png[/IMG][/URL] By the way @"Brandon"#p405 is there any OCR examples ?

Thanks @"demoacct01"#92 . To your question @"demoacct01"#p430 , we do have an in-progress example but it is crashing on some of the warp/re-shape when taking the EAST bounding boxes and feeding them through WARP/deWARP on the Myriad X. We just fixed the crash (got the message that it was fixed just now, actually). So the last step is the OCR output decoding. But the whole pipeline is actually now running. Here's the Github issue: https://github.com/luxonis/depthai/issues/124 And here's the WIP PR for the whole demo flow: https://github.com/luxonis/depthai-experiments/pull/26 We will likely update the PR shortly with the whole flow working, excepting the host-side decoding of the OCR. We're discussing internally the timing on that. Thoughts? Thanks again, Brandon

Hi @"Brandon"#p437, git checkout the branch gen2_ocr and gave it a test run. It runs slow ... :-( I tried to move the paper (with text on it) I was holding in front of the camera. Was like it never went past the first few frame or something. The output screen just freeze with the bounding box on some texts on the paper I was holding. When I did pip install -r requirements.txt, I got this error. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. depthai-gui 1.0.7 requires depthai==0.0.2.1+d436ec6b629c09b92c58d869e80aac52367a3aa9, but you have depthai 0.0.2.1+9430403bc960388da512c6c8936c27f8d1fa8b2d which is incompatible. Could this cause the "freeze" ?

Hi @"demoacct01"#p445 , Sorry about the trouble. The OCR is likely the slow part. The detection itself is a lot faster: https://youtu.be/sEWFQP9kdTM This example is still a work in progress. Luxonis-Alex is finishing it up now. We actually lost the engineer who was working on this (another one to Amazon!) so it got delayed as a result. This PR is what will allow for the flow to be on DepthAI/OAK directly: https://github.com/luxonis/depthai-shared/pull/16 It's what took a lot of time/debugging here. So now that this is out of the way, hopefully we will have the whole flow working well soon. Thanks and sorry about the delay. -Brandon

No problem @"Brandon"#p446 ... I am working on some OCR project at the moment. So was eager to see a working sample from your end. I hope I didn't misunderstood your response. Are you trying to say that if I "git checkout gen2_ocr" and run main.py I should see a speedup improvement? I tested it and it was slightly faster but not as fast as what was shown in the youtube. I realised the youtube was a last year video. So I am a bit confused now whether the fix is completed or not.

Hi @"demoacct01"#92 , The demo in the Youtube video is only running the text detection (bounding boxes) network, whereas the Gen2 one https://github.com/luxonis/depthai-experiments/pull/26 is running both text detection and text recognition, passing each cropped bounding box to the recognition model. So depending on the number of detections, it may be slower. I'm working now on this Gen2 example, moving the rotated cropping from host (OpenCV warp) to run on device (with the recent rotated cropping/rescale added in ImageManip node), and will add a decoder for the recognition model outputs as well: https://docs.openvinotoolkit.org/2020.1/_models_intel_text_recognition_0012_description_text_recognition_0012.html

Hello! I tried to use gen2-ocr project from GitHub https://github.com/luxonis/depthai-experiments/tree/master/gen2-ocr and got several errors. Using depthai version from requirements caused a warning "DeprecationWarning: setCamId() is deprecated, use setBoardSocket() instead." Switching between depthai versions made other warnings. There are some of them: - Creating depthai.Pipeline() has no Constructor - nn doesn't have function nn.passthrough (line 25) - silent crashes while compiling I use OAK-D with Windows-10. What am I doing wrong?

Hi @"asidsunrise"#p583 , Sorry about the delay (been inundated with Brexit/pandemic shipping/logistics problems in Europe). This is quite odd. So just to confirm, when trying out that example, you ran: `python -m pip install -r requirements.txt` And then: `python3 main.py` And it is still giving those errors? If so, is the installation of requirements throwing errors? As from what you shared, it is seeming that the API being used is not compatible with the example. Thoughts? Thanks, Brandon

CUSTOM OCR TRAINING

king

Dear Matija:

Thanks for your suggestion of custom ROI~ our target is not only text recognition but also ROI settings and running inside of ROI.

We will follow your ImageManip ndoe and and compile it with text recognition first . maybe there are good results for us.

all the best~~

king

Dear Matija and erik:

We could create ROI of 256x256 pixels due to east_text_detection_256x256 , but the results will be Influenced in black OCR and it cause incorrect results whc sometimes as below picture red frame :

could you please assist us how to solve the issue ??

all the best~~

Matija

Hey, you could manually increase these two lines, so a larger are is cropped: https://github.com/luxonis/depthai-experiments/blob/master/gen2-ocr/main.py#L241-L242.

I found out that rotate crops are slightly incorrect, but should be fixed with the ImageManip refactor which will be released soon. For now you can try increasing the width and height manually in the above lines, like:

rr.size.width  = int(rotated_rect[1][0] * 1.1)
rr.size.height = int(rotated_rect[1][1] * 1.1)

This will increase width and height for 10%.

Best,
Matija

king

Dear LUXonis partners :
We have OCR case as google drive as below and picture:

We want to check each OCR SET/CAN/RES , but results is still fail due to influence by pattern of + and -
we try every methods as you suggestion before (include rr.size.widthrr.size.height )but`still can't solve our customer needed.
This case our customer have 10pcs OAK-1 request, if we can solve the issue. maybe create a background pictures with any color code into gen2-OCR and combine with modifying EAST. but we don't know how to modify it yet~~
If you have a good solution for our case we can discuss more detail cooperation methods~

all the beat~~

Matija

king If you are limited to a set of these 3 words only, you could just filter them out with some distance metric or just check if the word is contained in the result. SET is in SSET, so you can know that SSET likely refers to SET, similarly CAN is in ICAN and RES in RESE.

If you are looking at other words as well, it's going to be hard to eliminate this + sign. You could try decreasing the factor by which you increase the width.

With image refactor that we are working on, I think the rotated squares should be cropped out better, but don't hold my word for it. I'll report some results when I'll be able to test it out. Until then, I propose you use one of the two approaches I mentioned above.

Best,
Matija

king

Dear Matija:

Thanks for your support of ROI OCR upgrade in this case ,we sincere that's could solve in this case , also we would like to know how long we could get the upgrade results??

all the best~~

king

Dear all:

We have another issue is autofocus function about OAK-1 as Picture below:

the OAK-1 is fixed on WD=150mm and only focus on A123 text ,we import autofocus.py into gen2-ocr
we set A123 is ok condition and output OK pictures . it shows clear image with OK pictures but also shows blurry image with NG pictures sometimes , we try to modify parameter of autofocus.py and still shows blurry image, could you please assist us how to modify the parameter of autofocus.py and only shows clear image:

import depthai as dai
import cv2
# Screen adjust for "left" "right" "up" "down"
# Step size ('W','A','S','D' controls)
STEP_SIZE = 8

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
videoEncoder = pipeline.create(dai.node.VideoEncoder)
stillEncoder = pipeline.create(dai.node.VideoEncoder)

controlIn = pipeline.create(dai.node.XLinkIn)
configIn = pipeline.create(dai.node.XLinkIn)
videoMjpegOut = pipeline.create(dai.node.XLinkOut)
stillMjpegOut = pipeline.create(dai.node.XLinkOut)
previewOut = pipeline.create(dai.node.XLinkOut)

controlIn.setStreamName('control')
configIn.setStreamName('config')
videoMjpegOut.setStreamName('video')
stillMjpegOut.setStreamName('still')
previewOut.setStreamName('preview')

# Properties
camRgb.setVideoSize(640, 360)
camRgb.setPreviewSize(300, 300)
videoEncoder.setDefaultProfilePreset(camRgb.getFps(), dai.VideoEncoderProperties.Profile.MJPEG)
stillEncoder.setDefaultProfilePreset(1, dai.VideoEncoderProperties.Profile.MJPEG)

# Linking
camRgb.video.link(videoEncoder.input)
camRgb.still.link(stillEncoder.input)
camRgb.preview.link(previewOut.input)
controlIn.out.link(camRgb.inputControl)
configIn.out.link(camRgb.inputConfig)
videoEncoder.bitstream.link(videoMjpegOut.input)
stillEncoder.bitstream.link(stillMjpegOut.input)

# Connect to device and start pipeline
with dai.Device(pipeline) as device:

    # Get data queues
    controlQueue = device.getInputQueue('control')
    configQueue = device.getInputQueue('config')
    previewQueue = device.getOutputQueue('preview')
    videoQueue = device.getOutputQueue('video')
    stillQueue = device.getOutputQueue('still')

    # Max cropX & cropY
    maxCropX = (camRgb.getResolutionWidth() - camRgb.getVideoWidth()) / camRgb.getResolutionWidth()
    maxCropY = (camRgb.getResolutionHeight() - camRgb.getVideoHeight()) / camRgb.getResolutionHeight()

    # Default crop
    cropX = 0
    cropY = 0
    sendCamConfig = True



    while True:
        previewFrames = previewQueue.tryGetAll()
        for previewFrame in previewFrames:
            cv2.imshow('preview', previewFrame.getData().reshape(previewFrame.getHeight(), previewFrame.getWidth(), 3))

        videoFrames = videoQueue.tryGetAll()
        for videoFrame in videoFrames:
            # Decode JPEG
            frame = cv2.imdecode(videoFrame.getData(), cv2.IMREAD_UNCHANGED)
            # Display
            cv2.imshow('video', frame)

            # Send new cfg to camera
            if sendCamConfig:
                cfg = dai.ImageManipConfig()
                cfg.setCropRect(cropX, cropY, 0, 0)
                configQueue.send(cfg)
                print('Sending new crop - x: ', cropX, ' y: ', cropY)
                sendCamConfig = False


        # Update screen (1ms pooling rate)
        key = cv2.waitKey(1)
        if key == ord('q'):
            break

        elif key == ord('t'):
            print("Autofocus trigger (and disable continuous)")
            ctrl = dai.CameraControl()
            ctrl.setAutoFocusMode(dai.CameraControl.AutoFocusMode.AUTO)
            ctrl.setAutoFocusTrigger()
            controlQueue.send(ctrl)
        elif key in [ord('w'), ord('a'), ord('s'), ord('d')]:
            if key == ord('a'):
                cropX = cropX - (maxCropX / camRgb.getResolutionWidth()) * STEP_SIZE
                if cropX < 0: cropX = maxCropX
            elif key == ord('d'):
                cropX = cropX + (maxCropX / camRgb.getResolutionWidth()) * STEP_SIZE
                if cropX > maxCropX: cropX = 0
            elif key == ord('w'):
                cropY = cropY - (maxCropY / camRgb.getResolutionHeight()) * STEP_SIZE
                if cropY < 0: cropY = maxCropY
            elif key == ord('s'):
                cropY = cropY + (maxCropY / camRgb.getResolutionHeight()) * STEP_SIZE
                if cropY > maxCropY: cropY = 0
            sendCamConfig = True

all the best~~

erik

Hello king ,
If the object is always 150mm away from the camera, I would suggest manually specifying the lens position instead of having autofocus enabled. You can achieve that like this:

camRgb = pipeline.create(dai.node.ColorCamera)
cam.initialControl.setManualFocus(90)

You can change the value between 0 and 255. You can see what's the best value by going through all focus values with this example (pressing , and . will change lens position).

Thanks, Erik

king

Dear Erik:

Thanks for your feedback for autofocus function~_we are trying to this parameter to check if it could be solved .

all the best~~

DavidP

Hello dear,
New on Luxonis and Depthai, my first project is based on OAK-D POE camera and OCR.
I am actually playing with experimental GEN 2 OCR, which is working like a charm.
But i am fighting with ROI management.
Actually the code is using 1024x1024 and preview 256x256 on a 12Mp camera.

I am trying to integrate a crop to aim on a dedicated zone of 1024x1024.
I am in a working area where i don't want to see people ( they dont want to be seen… lol)

I tried many imagemanip tests without reaching a working result.
and i just start in that domain so it is not easy.

Is there a way to modify easily the ccode to do 2048x1024?
And add a ROI?
i already saw some limitations, i can't do 2048x2048x3 but 2048x1024x3 should work
After that it is more the relocating which is a pain because it is not a x4 on all axis…

If someone did some improvments on it then don't hesitate to share.

Thank you

erik

Hi @DavidP ,

I tried many imagemanip tests without reaching a working result.

Could you share some minimal example, so we can look into it?

DavidP

erik
Hello Erik, yes of course i prepare that.

To explain more before code, the project is outside, near a train cleaning station.
And we want to read the train number when it arrives for a wash.
But i am parallel of the rails, i just need the bottom of the image but as large as i can to be sure that i will read the number.
Unfortunately i can't post pictures of the camera here, but you can imagine more the subject like that.

I prepare some code explanations to complete.

Thank you

DavidP

This is the screenshot of my OAK-D CM4 POE cam.
I want to crop the original image taken from the camera to keep the bottom of the image

this is from the original code :
pipeline = dai.Pipeline()

#version = "2021.2"

#pipeline.setOpenVINOVersion(version=dai.OpenVINO.Version.VERSION_2021_2)

version = "2022.1"

pipeline.setOpenVINOVersion(version=dai.OpenVINO.Version.VERSION_2022_1)

colorCam = pipeline.create(dai.node.ColorCamera)

colorCam.setPreviewSize(256, 256)

colorCam.setVideoSize(1024, 1024) # 4 times larger in both axis

colorCam.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)

colorCam.setInterleaved(False)

colorCam.setBoardSocket(dai.CameraBoardSocket.RGB)

colorCam.setFps(10)

controlIn = pipeline.create(dai.node.XLinkIn)

controlIn.setStreamName('control')

controlIn.out.link(colorCam.inputControl)

cam_xout = pipeline.create(dai.node.XLinkOut)

cam_xout.setStreamName('video')

colorCam.video.link(cam_xout.input)

# ---------------------------------------

# 1st stage NN - text-detection

# ---------------------------------------

nn = pipeline.create(dai.node.NeuralNetwork)

nn.setBlobPath(blobconverter.from_zoo(name="east_text_detection_256x256",zoo_type="depthai",shaves=6, version=version))

colorCam.preview.link(nn.input)

nn_xout = pipeline.create(dai.node.XLinkOut)

nn_xout.setStreamName('detections')

nn.out.link(nn_xout.input)

# ---------------------------------------

# 2nd stage NN - text-recognition-0012

# ---------------------------------------

manip = pipeline.create(dai.node.ImageManip)

manip.setWaitForConfigInput(True)

manip_img = pipeline.create(dai.node.XLinkIn)

manip_img.setStreamName('manip_img')

manip_img.out.link(manip.inputImage)

manip_cfg = pipeline.create(dai.node.XLinkIn)

manip_cfg.setStreamName('manip_cfg')

manip_cfg.out.link(manip.inputConfig)

manip_xout = pipeline.create(dai.node.XLinkOut)

manip_xout.setStreamName('manip_out')

nn2 = pipeline.create(dai.node.NeuralNetwork)

nn2.setBlobPath(blobconverter.from_zoo(name="text-recognition-0012", shaves=6, version=version))

nn2.setNumInferenceThreads(2)

manip.out.link(nn2.input)

manip.out.link(manip_xout.input)

nn2_xout = pipeline.create(dai.node.XLinkOut)

nn2_xout.setStreamName("recognitions")

nn2.out.link(nn2_xout.input)

So i am trying to adapt with 2 image manipulators to link to another ROI :

pipeline = dai.Pipeline()

#version = "2021.2"

#pipeline.setOpenVINOVersion(version=dai.OpenVINO.Version.VERSION_2021_2)

version = "2022.1"

pipeline.setOpenVINOVersion(version=dai.OpenVINO.Version.VERSION_2022_1)

colorCam = pipeline.create(dai.node.ColorCamera)

colorCam.setPreviewSize(256, 256)

colorCam.setVideoSize(2048, 2048) # 4 times larger in both axis

colorCam.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP)

colorCam.setInterleaved(False)

colorCam.setBoardSocket(dai.CameraBoardSocket.RGB)

colorCam.setFps(10)

#MODIFICATION -> creation of the main video Imagemanip

manip_video = pipeline.create(dai.node.ImageManip)

RrVideo = dai.RotatedRect()

RrVideo.center.x,RrVideo.center.y = 0,1024

RrVideo.size.width,RrVideo.size.height = 1024,1024

manip_video.initialConfig.setCropRotatedRect(RrVideo,False)

manip_video.setMaxOutputFrameSize(1024*1024*3)

colorCam.video.link(manip_video.inputImage)

#MODIFICATION -> creation of the preview video Imagemanip

manip_preview= pipeline.create(dai.node.ImageManip)

manip_preview.setResize(256,256)

manip_preview.setMaxOutputFrameSize(256*256*3)

manip_video.out.link(manip_preview.inputImage)

controlIn = pipeline.create(dai.node.XLinkIn)

controlIn.setStreamName('control')

controlIn.out.link(colorCam.inputControl)

cam_xout = pipeline.create(dai.node.XLinkOut)

cam_xout.setStreamName('video')

#MODIFICATION -> link to manip_video output

#colorCam.video.link(cam_xout.input)

manip_video.out.link(cam_xout.input)

# ---------------------------------------

# 1st stage NN - text-detection

# ---------------------------------------

nn = pipeline.create(dai.node.NeuralNetwork)

nn.setBlobPath(blobconverter.from_zoo(name="east_text_detection_256x256",zoo_type="depthai",shaves=6, version=version))

#MODIFICATION -> link the preview to text detection

#colorCam.preview.link(nn.input)

manip_preview.out.link(nn.input)

nn_xout = pipeline.create(dai.node.XLinkOut)

nn_xout.setStreamName('detections')

nn.out.link(nn_xout.input)

# ---------------------------------------

# 2nd stage NN - text-recognition-0012

# ---------------------------------------

manip = pipeline.create(dai.node.ImageManip)

manip.setWaitForConfigInput(True)

manip_img = pipeline.create(dai.node.XLinkIn)

manip_img.setStreamName('manip_img')

manip_img.out.link(manip.inputImage)

manip_cfg = pipeline.create(dai.node.XLinkIn)

manip_cfg.setStreamName('manip_cfg')

manip_cfg.out.link(manip.inputConfig)

manip_xout = pipeline.create(dai.node.XLinkOut)

manip_xout.setStreamName('manip_out')

nn2 = pipeline.create(dai.node.NeuralNetwork)

nn2.setBlobPath(blobconverter.from_zoo(name="text-recognition-0012", shaves=6, version=version))

nn2.setNumInferenceThreads(2)

manip.out.link(nn2.input)

manip.out.link(manip_xout.input)

nn2_xout = pipeline.create(dai.node.XLinkOut)

nn2_xout.setStreamName("recognitions")

nn2.out.link(nn2_xout.input)

But as i said, i just start from scratch on DepthAI, then i possibly missed some fundamental parts….

Regards

DavidP

actually in the image, the part mentionned "the part that i want" is not reflecting my code.
and not reflecting exactly what i want.
in 12Mp mode, i would like to keep 2048x1024 of the bottom. that is exactly want i want to do but first i need lights lol

erik

Hi @DavidP
You can use ImageManip and crop the bottom frame;

#!/usr/bin/env python3

import cv2
import depthai as dai

# Create pipeline
pipeline = dai.Pipeline()

camRgb = pipeline.create(dai.node.ColorCamera)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_12_MP) # 4056x3040
width = 4056
height = 1520

crop_manip = pipeline.create(dai.node.ImageManip)
crop_manip.initialConfig.setCropRect(0, 0.5, 1, 1)
crop_manip.setMaxOutputFrameSize(int(width * height * 1.5))
crop_manip.setFrameType(dai.RawImgFrame.Type.NV12)
camRgb.isp.link(crop_manip.inputImage)

xout2 = pipeline.create(dai.node.XLinkOut)
xout2.setStreamName('crop')
crop_manip.out.link(xout2.input)

with dai.Device(pipeline) as device:
    # Output queue will be used to get the rgb frames from the output defined above
    q1 = device.getOutputQueue(name="crop", maxSize=4, blocking=False)

    while True:
        if q1.has():
            cv2.imshow("Bottom Tile", q1.get().getCvFrame())
        if cv2.waitKey(1) == ord('q'):
            break

DavidP

erik Thank you Erik, i test to modify with your version.

Regards

DavidP

Unfortunately, cropping is ok alone but not in my case,
The text detection NN taking 256x256 images, when i use colorcam.preview to get in, then ok but if i do a resize of another image to 256x256 to get into this neural network then crash.

works :

crash :

i miss basics of links declarations , is there a documentation about dai.node objects?
and about how intanciate them correctly?
a kind of "newby" documentation lol
Thanks

DavidP

i took the example at this page :
Hello World (luxonis.com)
and modified to use image manip as preview :

then i have the following issue :

which i don't understand clearly

erik

@DavidP you could google the error and get some useful links...
ImageManip will output NV12 format (not BGR), and NN expects BGR planar (very likely).

crop_manip.setFrameType(dai.RawImgFrame.Type.BGR888p)

DavidP

erik LOL i used chatgpt which is not a boss on depthai honnestly, and i lost my habitudes to get answers from google but i will check in the future because it works.
I go back in my OCR code to transpose.

Thank you Erik!

« Previous Page Next Page »