• RAE
  • No chance in getting a WLAN connection & device gets VERY hot

I can confirm the RAE becomes hot while running the ROS stack. It's "slightly warm" only in idle mode.

Do you have any bundled scripts to check the temperature, as you can do on raspi?

There are a few folders around here … this gives a number …

cat /sys/class/thermal/thermal_zone1/temp

    Ok, I tested it, and here are the results:

    • The first experiment was with a bringup.launch. I've just started the most recent image, w/o load, idle mode. I stopped the process after ~ an hour, and one of the thermal zones reached 60 degrees. You can ignore the errors in the video, as the entire stack is not yet working.
    • The second experiment was triggered several minutes after the previous one. But that time, I ran robot.launch with an additional load on motors (teleop). As a result, the same thermal zone that ended at 60 degrees quickly reached 65 degrees in just 6 minutes. Then, I shut the nodes down to avoid hardware damage.

    One important observation: as you can see in both videos, I also printed the cooling devices' state apart from the thermal data. Zero index device is related to VPU. Its initial state was equal to 5 when I started the nodes. But then it quickly switched to 0 and was never restored. I wonder if 0 means the absence of cooling. But then it seems weird that it goes off when we give some load. Anyway, the second experiment clearly shows that the temperature is constantly increasing under common teleoperation. And it's definitely abnormal.

    Just played a little bit with RViz, map, laser scans, cameras, etc.

    • Mike replied to this.

      sskorol … without knowing where/what the thermal zones represent it is dangerous to assume 65 degC is outside the operating temps … it is most likely the CPU cores which can go above 60 degC typically … if it was case temp then I would be concerned … but we should wait until Luxonis publishes some specs on this before we say Rae is about to go into a critical meltdown ,,,

      @Mike yeah, I mean, I know that in RPi for instance CPU temp can go up to 85 degrees. But when I see 71 (like on the second screen) w/o motors usage, with a tendency to growth, it becomes suspicious. And yes, the whole case is hot. Keeping in mind it’s metal, it’s not comfortable to touch/hold it in hands at all. My guess the main temperature spike comes from cameras. If you’ve ever worked with Luxonis devices, you know how hot they might become. I have several oak-d cameras. And it’s hard to touch them when they are in use. If their radiators touch the case, then I’d say that’s the main reason, why it’s so hot. Maybe it was intentional by design. But I don’t really want my kid to accidentally touch it while playing.

      I adjusted the script to print the type of cooling device and thermal zone (idle mode):

      I'm slightly confused because top temperature values (based on the previous experiments) come from the battery gauge and Wi-Fi zones, while VPU cooling drops to zero. Is it even safe?

      As expected… When I reached the following numbers by running a full RAE stack in docker (idle mode), and then ran the teleop node, RAE rebooted in a couple of seconds after triggering the motors.

      • Mike replied to this.

        sskorol … I did find the page below … indicating that temps can go high … it is not comparing apples with apples … but still good to know … While RAE rebooted in your test it would still be dangerous to blame that on the temps given the current state of the available software … things should stabilize over the next few months.

        https://docs.luxonis.com/projects/hardware/en/latest/pages/articles/operative_temperature_range/?highlight=temperature

        Hey, I am currently in progress of testing and trying to reproduce the issue - I am not really managing to get rae to go over 57 degrees even after couple of hours of running cameras+teleopt stack.

        In the past we had issues with LEDs overheating the device - that issue should be solved but I think it is still worth a try to check if it is LEDs overheating the device and then go from there. You can turn LEDs off in default ros stack (robot.launch.py) by either removing LED peripheral node from rae_hw/launch/peripherals.launch.py (assuming you disabled the agent) or even bit bootleg solution like changing this line to always be false (thus sending empty LED messages) should suffice. If we can narrow down issue to since peripheral that would be very helpful.

        Thanks and sorry for the inconvenience.

          DaniloPejovic, running a full ROS stack for about 35 minutes with a disabled LED node. WiFi temperature holds at ~58-59 degrees. Battery gauge - 54-55. Also, I gave a relatively small load on servos via teleop. So, your observation regarding LEDs seems correct, and they cause overheating. So what was the fix? Is it a pure hardware issue? And if it was fixed, then how did it appear in production? Anyway, what would be the next steps?

          As there were recent LED updates pushed to the ROS repo, I decided to check the theory and executed a full stack with LED node, which led me to the following numbers in just 5 minutes:

          However, there was another observation. In the previous message, I didn't use cameras. And when I added a couple of camera views in RViz, there was a temperature spike in the WiFi thermal zone.

          I could reach 66-67 degrees. However, it never jumped above this point. So, it seems like the problem is more complicated. LEDs are still probably the main failure point. But I don't believe they cause overheating in isolation. When I shut down the ROS stack, LEDs remained active (bug). But the temperature dropped to 57-58 degrees as well. So, it seems like cameras + LEADs in conjunction cause the overheating.

          Update: after a couple of hours of running the full stack with active cameras but w/o LEDs, I still reached the high temp in a WiFi zone (70-71 degrees). So it seems like it's just a matter of time to come to the red zone with active camera streams.

            Mike based on the specs, WiFi module operating temperature is 0-80 degrees. While processors start throttling when overheated, units like WiFi usually just shutdown until temperature is stabilized to protect themselves from the thermal shock (based on feedback from people who do a lot of stuff with such hardware).

            I didn’t find it in specs, but if I was the manufacturer, I’d probably leave ~5% buffer for thermal protection. It’s 76 degrees in this case (and it was +- the last value I saw before RAE rebooted). That’s why I’d treat ~72 degrees as a red zone for the end-user to get prepared for shutdown.

            From what I’ve seen on disassembled RAE photos, WiFi module is located in the zone close to one of the camera processors. Based on VPU specs, its operational temperature is up to 120 degrees. So 5 active processors and LEDs could generate a lot of heat for such a small area.

            • Mike replied to this.

              sskorol … Too early for those conclusions … An operating range is an operating range … if a device caused a reboot within the operating temp range then the device is faulty - it does happen.

                Mike I wonder if you've already tried to measure the temperature on your robot? What are the numbers in active mode with cameras, leds, teleoperation?

                • Mike replied to this.

                  sskorol … I have used the default app to drive Rae around until battery was flat … LEDs were on … WIFI on … did not measure temps … Rae worked fine … After seeing how raw things are at the moment and my available time I will shelve Rae for a month and dust off in December when I will have more time. Hopefully by then there is more software integration and updates … Meta Quest 3 integration soon ;-)