• RAE
  • No chance in getting a WLAN connection & device gets VERY hot

Got one step further after hard resetting: QR scan method etc. works, I triple checked that. I now get a proper "Couldn't connect" message with the red blinking, too. With the "fresh" device, I can now see all my local WLAN APs, too, so the WLAN interface does seem to work, too:

wpa_cli v2.9

Copyright (c) 2004-2019, Jouni Malinen <j@w1.fi> and contributors

This software may be distributed under the terms of the BSD license.

See README for more details.

Selected interface 'p2p-dev-wlp1s0'

Interactive mode

<3>CTRL-EVENT-SCAN-RESULTS

scan_results

> bssid / frequency / signal level / flags / ssid

82:a7:41:ec:80:f1 5220 -41 [WPA2-PSK-CCMP][ESS] lannisport_iot

76:a7:41:ec:80:f1 5220 -77 [WPA2-PSK-CCMP][ESS] lannisport

7e:a7:41:ec:80:f1 5220 -77 [WPA2-PSK-CCMP][ESS] lannisport_restricted

7e:a7:41:ec:80:f0 2462 -83 [WPA2-PSK-CCMP][ESS] lannisport_iot

7a:a7:41:ec:80:f0 2462 -44 [WPA2-PSK-CCMP][ESS] lannisport_restricted

7a:a7:41:ec:80:f1 5220 -77 [ESS] eighteenguests

76:a7:41:ec:80:f0 2462 -43 [ESS] eighteenguests

86:a7:41:ec:80:f0 2462 -83 [WPA2-PSK-CCMP][ESS]

70:a7:41:ec:80:f0 2462 -44 [WPA2-PSK-CCMP][ESS] lannisport

82:a7:41:ec:80:f0 2462 -83 [WPA2-PSK-CCMP][ESS]

ea:63:da:a4:f3:fb 2462 -79 [WPA2-PSK-CCMP][ESS] lannisport_restricted

ee:63:da:a4:f3:fb 2462 -80 [WPA2-PSK-CCMP][ESS] lannisport_iot

e0:63:da:a4:f3:fb 2462 -82 [WPA2-PSK-CCMP][ESS] lannisport

e6:63:da:a4:f3:fb 2462 -79 [ESS] eighteenguests

I have three APs, all providing the same four SSIDs (lannisport_iot, lannisport:restricted, lannisport, and eighteenguests). Could this be the issue, that the rae doesn't support any roaming or fails to connect to an SSID, when multiple APs are available to connect to it? It seems to "see" the nearest AP with both 2.4 & 5 GHz bands plus one other with 2.4 GHz only. According to the documentation, wpa_supplicant should support roaming, though.

Hah! Found it! hostapd is blocking wlp1s0! After stopping the service, the link to my WLAN comes up. This is reproducible, since hostapd is running after each reboot… I am not too familiar with systemd and not at all with hostapd: Quick hint or pointer on how to disable the automatic start on reboot?

@Mike Do you have any hint which 1.13 to pick? Running on 1.12. And which URL do I need to provide to mender: To the directory, to the *.mender file…(I guess: the latter)? Sorry if these are stupid questions, but "mender -install <link_to_firmware>" isn't that conclusive with respect to what the link should point to 😉

BTW: I just measured the surface tempratire (just on the top of the rae): 54°C after ca. 30-45min fighting with the WLAN config, see above. Doesn't sound right to me. I am now starting to get kicked out of the ssh session again and can't reconnect…

root@keembay:~# wpa_supplicant -i wlp1s0 -c /etc/wpa_supplicant.conf -B

Successfully initialized wpa_supplicant

root@keembay:~# client_loop: send disconnect: Connection reset

Shutdown via double click stopped working, too, only hard shutdown will work then.

  • Mike replied to this.

    Don't get further, the device is mostly dead. When starting up, I got the logo and then the LEDs where "blinking" white, but not in a good way. Looked more like something was wrong with the process controlling the RGB-LEDs. Devices doesn't react to power button at all - neither double-click, nor pressing 8s (or longer). Had to wait until the battery died…

    Sent a mail to support, let's see how they react. So far, rae is a very frustrating experience…

      Hi DiMa
      Very likely it's a hardware issue. Thanks for emailing support and sorry for the inconvenience.

      Regards,
      Jaka

      I can confirm the RAE becomes hot while running the ROS stack. It's "slightly warm" only in idle mode.

      Do you have any bundled scripts to check the temperature, as you can do on raspi?

      There are a few folders around here … this gives a number …

      cat /sys/class/thermal/thermal_zone1/temp

        Ok, I tested it, and here are the results:

        • The first experiment was with a bringup.launch. I've just started the most recent image, w/o load, idle mode. I stopped the process after ~ an hour, and one of the thermal zones reached 60 degrees. You can ignore the errors in the video, as the entire stack is not yet working.
        • The second experiment was triggered several minutes after the previous one. But that time, I ran robot.launch with an additional load on motors (teleop). As a result, the same thermal zone that ended at 60 degrees quickly reached 65 degrees in just 6 minutes. Then, I shut the nodes down to avoid hardware damage.

        One important observation: as you can see in both videos, I also printed the cooling devices' state apart from the thermal data. Zero index device is related to VPU. Its initial state was equal to 5 when I started the nodes. But then it quickly switched to 0 and was never restored. I wonder if 0 means the absence of cooling. But then it seems weird that it goes off when we give some load. Anyway, the second experiment clearly shows that the temperature is constantly increasing under common teleoperation. And it's definitely abnormal.

        Just played a little bit with RViz, map, laser scans, cameras, etc.

        • Mike replied to this.

          sskorol … without knowing where/what the thermal zones represent it is dangerous to assume 65 degC is outside the operating temps … it is most likely the CPU cores which can go above 60 degC typically … if it was case temp then I would be concerned … but we should wait until Luxonis publishes some specs on this before we say Rae is about to go into a critical meltdown ,,,

          @Mike yeah, I mean, I know that in RPi for instance CPU temp can go up to 85 degrees. But when I see 71 (like on the second screen) w/o motors usage, with a tendency to growth, it becomes suspicious. And yes, the whole case is hot. Keeping in mind it’s metal, it’s not comfortable to touch/hold it in hands at all. My guess the main temperature spike comes from cameras. If you’ve ever worked with Luxonis devices, you know how hot they might become. I have several oak-d cameras. And it’s hard to touch them when they are in use. If their radiators touch the case, then I’d say that’s the main reason, why it’s so hot. Maybe it was intentional by design. But I don’t really want my kid to accidentally touch it while playing.

          I adjusted the script to print the type of cooling device and thermal zone (idle mode):

          I'm slightly confused because top temperature values (based on the previous experiments) come from the battery gauge and Wi-Fi zones, while VPU cooling drops to zero. Is it even safe?

          As expected… When I reached the following numbers by running a full RAE stack in docker (idle mode), and then ran the teleop node, RAE rebooted in a couple of seconds after triggering the motors.

          • Mike replied to this.

            sskorol … I did find the page below … indicating that temps can go high … it is not comparing apples with apples … but still good to know … While RAE rebooted in your test it would still be dangerous to blame that on the temps given the current state of the available software … things should stabilize over the next few months.

            https://docs.luxonis.com/projects/hardware/en/latest/pages/articles/operative_temperature_range/?highlight=temperature

            Hey, I am currently in progress of testing and trying to reproduce the issue - I am not really managing to get rae to go over 57 degrees even after couple of hours of running cameras+teleopt stack.

            In the past we had issues with LEDs overheating the device - that issue should be solved but I think it is still worth a try to check if it is LEDs overheating the device and then go from there. You can turn LEDs off in default ros stack (robot.launch.py) by either removing LED peripheral node from rae_hw/launch/peripherals.launch.py (assuming you disabled the agent) or even bit bootleg solution like changing this line to always be false (thus sending empty LED messages) should suffice. If we can narrow down issue to since peripheral that would be very helpful.

            Thanks and sorry for the inconvenience.

              DaniloPejovic, running a full ROS stack for about 35 minutes with a disabled LED node. WiFi temperature holds at ~58-59 degrees. Battery gauge - 54-55. Also, I gave a relatively small load on servos via teleop. So, your observation regarding LEDs seems correct, and they cause overheating. So what was the fix? Is it a pure hardware issue? And if it was fixed, then how did it appear in production? Anyway, what would be the next steps?

              As there were recent LED updates pushed to the ROS repo, I decided to check the theory and executed a full stack with LED node, which led me to the following numbers in just 5 minutes:

              However, there was another observation. In the previous message, I didn't use cameras. And when I added a couple of camera views in RViz, there was a temperature spike in the WiFi thermal zone.

              I could reach 66-67 degrees. However, it never jumped above this point. So, it seems like the problem is more complicated. LEDs are still probably the main failure point. But I don't believe they cause overheating in isolation. When I shut down the ROS stack, LEDs remained active (bug). But the temperature dropped to 57-58 degrees as well. So, it seems like cameras + LEADs in conjunction cause the overheating.

              Update: after a couple of hours of running the full stack with active cameras but w/o LEDs, I still reached the high temp in a WiFi zone (70-71 degrees). So it seems like it's just a matter of time to come to the red zone with active camera streams.