Current deconz (2.24.3 & 2.25.1) slow in response after time

Mimiix · January 22, 2024, 4:59pm

lukicsl · January 23, 2024, 7:50pm

Currently, I am running my production home automation docker stack, which also contains a deCONZ container on a Raspberry PI 4 aarch64.

As I described in this thread I am experiencing problems with versions > 2.23.2.

Today I installed a fresh Raspberry Pi 3B armv7l, copied the mapped /opt/deCONZ directory, and plugged over the same conbee II from the production server.

I wanted to isolate the problem as much as possible and try to run 2.23.2 and 2.24.3 for comparison.
The result is the same as on the production stack.

One interesting thing is the CPU load, for 2.24.3 it is much higher, at 18:38 I switched from 2.24.3 to 2.23.2:

Mimiix · January 23, 2024, 7:56pm

@de_employees

Ar-Eh · January 23, 2024, 11:45pm

I am also having this issue after upgrading yesterday from 2.23.2 to 2.24.3. Slow response time for all devices, missed service calls, and higher-than-normal cpu usage for the docker container. The web ui is also very slow. Reverted to 2.23.2 and all is well again. I’m using a Conbee II with Docker on a Synology DS718+/

lukicsl · January 24, 2024, 6:48am

I have created an “all-in” log file with all log options checked:

debug-2.24.3-all-in.log

and again the CPU goes up

Mimiix · January 24, 2024, 11:07am

Can you share a screenshot of the deCONZ gui?

I have the suspicion something is off with the LQI’s and network buildup.

lukicsl · January 24, 2024, 11:30am

manup · January 24, 2024, 11:31am

The higher CPU usage is indeed weird, I’m gonna make some tests with a profiler.
The logs themself look mostly fine, expect the 0x86 unsupported attribute status codes. As far as I can tell these are due older Philips Hue firmware which don’t support the attributes referenced in the DDF.

Mimiix · January 24, 2024, 11:46am

The slowness, is that perhaps with devices that are connected to the devices on the left top and bottom? The connections can be better IMHO, and that might cause the slowness.

lukicsl · January 24, 2024, 12:24pm

Still, the topography does not change when just stepping up the version.
Is the Conbee III something better than the Conbee II in regards to LQI and other stuff?

manup · January 24, 2024, 12:29pm

@lukicsl can you please also enable DEV and DDF logging with v2.24.3.
From the logs I see a few things which take longer than used to be, mainly for legacy code devices here I’d like to see if more logging context gives some insights what’s happening.

lukicsl · January 24, 2024, 1:28pm

Have you looked at these?

manup · January 25, 2024, 11:12am

Ha right indeed the DEV and DDF is in there. I’ve checked various code paths yesterday to figure out what can cause the high CPU usage in you network. In my test setup there is only the normal CPU usage of average 2-5% after initial load.

A significant difference is that your network has a higher mix of legacy vs. DDF driven devices than mine, and the high CPU load depends on what devices are in the network, that’s why it can be seen in some but not all setups.

Over the past year there were many additions of new DDFs for various devices (currently ca. 330 DDFs with >600 modelid/manufacturer name pairs), and I think this causes the legacy code paths to be more stressed. It’s a bit difficult to describe, but the legacy code has a few interception points with new DDF code which in worst case can lead to large checks against all DDFs eating CPU cycles. My guess here is that this amplifies at some point, like the logs show over 3000 string comparisons after receiving a command in one case which is nuts.

Long story short I’m in the process to refactor a few things to prevent useless tasks in the legacy code and also prevent repeated requests which will always fail like the 0x86 attribute unsupported error shown in the logs.

Hope to have this ready for testing in a couple of days. Note that this is only an assumption currently we’ll only see after testing if this is indeed the root of the issue.

Erik7 · January 29, 2024, 3:25pm

Hi Manup,
I don’t want to be impatient , but have you been able to test some things already? It seems once again that this issue may cause a new delay in the HA deCONZ addon (which is already taking many months).

github.com/home-assistant/addons

Updates deconz to the latest stable version: 2.24.3

home-assistant:master ← francescopeloi:master

opened 03:22PM - 23 Jan 24 UTC

francescopeloi

+6 -2

Updates deconz to the latest stable version: [2.24.3](https://github.com/dresden…-elektronik/deconz-rest-plugin/releases/tag/v2.24.3) Note: this hasn't been tested locally by me. I am trying to help with what I can, so at least do the code change. - Phoscon/deconz says that 2.24.3 includes a fix for this deconz addon to work ([source](https://github.com/dresden-elektronik/deconz-rest-plugin/releases/tag/v2.24.3)) - Problems that were experienced before are discussed in deconz forum ([source](https://forum.phoscon.de/t/phoscon-de-pwa-login-html-does-not-work-anymore/3967)) and on the previous PR ([source](https://github.com/home-assistant/addons/pull/3323)) - I think the [other PR](https://github.com/home-assistant/addons/pull/3323) can be closed, as it's upgrading to a broken version.

Thank you for your great work!

manup · January 29, 2024, 3:43pm

Hi, yes after digging deeper in the legacy code and also checking the profiler I think we found some spots which have larger impact on keeping the CPU busy. It’s one of this rabbit holes where heaps of stuff shows up which needs to be optimized.

Here is a 3 minutes flame graph showing effects of some cleanups, prior here was a lot of GUI redrawing shown and some needless looping over devices and internal data.

The graph shows also more potential to get rid of cycle eaters in future releases (here mainly the middle and left sides do way too much work than needed).

I’m creating a test release for arm version later today to see if the changes improve another developer setup where similar issues can be seen.

lukicsl · January 29, 2024, 3:52pm

If it’s not too much work, a docker image wold be great!

Mimiix · January 29, 2024, 4:15pm

Thats something the docker guys need to do. @phdelodder could this be done?

phdelodder · January 29, 2024, 4:36pm

Create an issue and I’ll look into it

Mimiix · January 29, 2024, 4:38pm

That probably can be done after its there.

I’ll ping you on discord.

Ar-Eh · January 29, 2024, 6:10pm

Thank you to all the devs for all of the great work you do to keep deconz running smoothly.