Delayed and Missing Bulb Updates (deCONZ Struggling to Handle Large Networks?)

I’m seeing errors like the below all day long. Always one bulb. What is this about? I can’t say I’ve seen these before. Should I just replace it? It does turn on and off fine.

11:35:51:029 delay sending request 27 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
11:35:51:129 delay sending request 28 dt 0 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
11:35:50:730 delay sending request 28 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
11:35:50:429 delay sending request 27 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
12:05:51:429 delay sending request 225 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,

Thanks in advance!

Here’s another chunk of errors/notifications that might be related?

12:29:48:006 failed to add task 377217 type: 6, too many tasks,
12:29:48:006 5 running tasks, wait,
12:29:48:006 failed to add task 377220 type: 11, too many tasks,
12:29:48:006 failed to add task 377221 type: 6, too many tasks,
12:29:48:007 5 running tasks, wait,
12:29:48:028 5 running tasks, wait,
12:29:48:129 5 running tasks, wait,
12:29:48:185 	0x001788010BBD4E8D force poll (2),
12:29:48:229 5 running tasks, wait,
12:29:48:329 5 running tasks, wait,
12:29:48:429 5 running tasks, wait,
12:29:48:529 5 running tasks, wait,
12:29:48:611 delay sending request 168 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:48:622 delay sending request 168 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:48:623 delayed group sending,
12:29:48:625 delay sending request 168 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:48:626 delayed group sending,
12:29:48:626 delayed group sending,
12:29:48:629 5 running tasks, wait,
12:29:48:729 5 running tasks, wait,
12:29:48:778 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task,
12:29:48:778 delay sending request 168 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:48:829 5 running tasks, wait,
12:29:48:872 delay sending request 168 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:48:929 5 running tasks, wait,
12:29:48:972 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task,
12:29:48:974 delay sending request 168 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:49:029 5 running tasks, wait,
12:29:49:129 5 running tasks, wait,
12:29:49:229 5 running tasks, wait,
12:29:49:329 5 running tasks, wait,
12:29:49:370 delay sending request 168 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:49:429 5 running tasks, wait,
12:29:49:440 delay sending request 168 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:49:529 5 running tasks, wait,
12:29:49:598 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:49:629 5 running tasks, wait,
12:29:49:658 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:49:706 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:49:729 5 running tasks, wait,
12:29:49:787 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task,
12:29:49:791 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:49:829 5 running tasks, wait,
12:29:49:929 5 running tasks, wait,
12:29:49:953 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task,
12:29:49:954 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:031 5 running tasks, wait,
12:29:50:070 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:129 5 running tasks, wait,
12:29:50:164 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:230 5 running tasks, wait,
12:29:50:262 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:325 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:329 5 running tasks, wait,
12:29:50:410 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:429 delay sending request 168 dt 2 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:529 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:621 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:629 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:672 0x00178801097845F4 error APSDE-DATA.confirm: 0xE9 on task,
12:29:50:673 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:730 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:829 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:50:929 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:029 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:068 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:105 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:129 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:229 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:329 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:429 delay sending request 168 dt 3 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:538 delay sending request 168 dt 4 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:629 delay sending request 168 dt 4 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:651 0x0017880106968FBD error APSDE-DATA.confirm: 0xE9 on task,
12:29:51:652 delay sending request 168 dt 4 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:729 delay sending request 168 dt 4 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:29:51:825 	0x001788010BBD4E8D force poll (2),
12:29:51:969 	0x001788010BBD4E8D force poll (2),
12:29:55:494 delayed group sending,
12:29:55:495 delayed group sending,
12:29:55:495 delayed group sending,
12:29:55:495 delayed group sending,
12:29:55:495 delayed group sending,
12:29:55:495 delayed group sending,
12:29:55:495 delayed group sending,
12:29:55:496 delayed group sending,
12:29:55:496 delayed group sending,
12:29:55:496 delayed group sending,
12:29:55:496 delayed group sending,
12:29:55:496 delayed group sending,
12:29:55:729 5 running tasks, wait,
12:29:55:829 5 running tasks, wait,
12:29:55:929 5 running tasks, wait,
12:29:56:030 5 running tasks, wait,
12:29:56:088 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task,
12:29:56:129 5 running tasks, wait,
12:29:56:229 5 running tasks, wait,
12:29:56:339 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task,
12:29:57:037 0x00178801060959B4 error APSDE-DATA.confirm: 0xE9 on task,
12:29:59:902 reuse dead link (dead link container size now 394)

And a reminder if @manup or others don’t remember - I’m the guy with the big system: 368 nodes (1 ConBee II, 17 FLS-CT strip controllers, and 350 Hue bulbs) as well as 106 groups.

Here’s another chunk of logs with some scary things:

12:59:48:346 5 running tasks, wait,
12:59:48:346 failed to add task 389217 type: 11, too many tasks,
12:59:48:347 failed to add task 389218 type: 6, too many tasks,
12:59:48:347 5 running tasks, wait,
12:59:48:347 failed to add task 389221 type: 11, too many tasks,
12:59:48:347 failed to add task 389222 type: 6, too many tasks,
12:59:48:347 5 running tasks, wait,
12:59:48:347 failed to add task 389225 type: 11, too many tasks,
12:59:48:347 failed to add task 389226 type: 6, too many tasks,
12:59:48:348 5 running tasks, wait,
12:59:48:348 5 running tasks, wait,
12:59:48:404 5 running tasks, wait,
12:59:48:769 5 running tasks, wait,
12:59:48:798 delay sending request 213 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
12:59:48:798 delay sending request 214 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:59:48:822 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task,
12:59:48:822 delay sending request 213 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
12:59:48:822 delay sending request 214 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:59:48:864 5 running tasks, wait,
12:59:48:920 delay sending request 213 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
12:59:48:921 delay sending request 214 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:59:48:979 5 running tasks, wait,
12:59:49:005 delay sending request 213 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
12:59:49:005 delay sending request 214 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:59:49:054 5 running tasks, wait,
12:59:49:098 delay sending request 213 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
12:59:49:099 delay sending request 214 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:59:49:124 delay sending request 213 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
12:59:49:125 delay sending request 214 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,
12:59:49:149 5 running tasks, wait,
12:59:49:210 delay sending request 213 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0008 onAir: 1,
12:59:49:210 delay sending request 214 dt 1 ms to 0x001788010BBD4E8D, ep: 0x0B cluster: 0x0300 onAir: 1,

What does it mean it says “too many tasks”? I mean, I get it, but why would that ever happen?

It happen whith automation ?

The message can be normal, deconz is able to manage them.
But it s like you are sending too much request in too less time, some of them are put in a queue list. but it seem too some of them are discarted …

12:29:56:339 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task,
12:29:57:037 0x00178801060959B4 error APSDE-DATA.confirm: 0xE9 on task,

Can be too the device that have problem to manage request

I do have a VERY large system and I also have an adaptive lighting scheme running that updates color and brightness on all the lights that are on every 90 seconds. So it’s a busy system. Why can’t deCONZ handle that?

And still - there’s the original question here about the single bulb that keeps showing up in the logs. Strange, no?

Honnestly I don’t know deconz limitation, and if they are from the deconz application or the zigbee network ? I think @manup know that better than me.

And yes, it’s for that I talked about the device connection, strange this one appear so often in logs, you can swap it to make tests ?

I have found values, but IDK why they have been choosed like that

#define MAX_GROUP_SEND_DELAY 5000 // ms between to requests to the same group
#define GROUP_SEND_DELAY 50 // default ms between to requests to the same group
#define MAX_TASKS_PER_NODE 2
#define MAX_BACKGROUND_TASKS 5

Good find, @Smanar

This raises a good question, I think, for @manup - can you help us understand why these values are set the way they are? It seems… rather restrictive, especially for large systems.

I have hundreds of bulbs, and dozens of deCONZ groups, all rolled up into additional lighting groups in Home Assistant. I have found, through much trial and error, that the “freezing” of the system (which then requires me to leave and re-join the Zigbee network to correct) as well as lights that deCONZ and HA report as having turned on and off when they have not done so, are almost always due to “overloading” (for lack of a beter term!) deCONZ by doing “too much” at once. It can be as simple as turning on or off too many groups at once. I have had to resort to turning off lights by deCONZ light groups one by one which seems to give deCONZ time to accomodate. Turning on or off more than a few groups at once causes the freezing or the mis-reporting of states. It’s very frustrating. It’s made it impossible for me to automate more than a few things at once and also impossible to use circadian rhythm color and brightness updating tools, which of course make changes to many lights at the same time.

Yes, I could create some very complex rate limiting automations, and could decide not to use circadian rhythm tools, but I shouldn’t have to make those compromises. deCONZ should be able to handle an API request to make a change to all 367 bulbs at once (i.e. turning all my lights on or off or updating color and brightness to all lights at once) by doing whatever kind of network-protecting rate-limiting itself.

Any thoughts here, @manup? cc: @Mimiix (maybe you can tag “de_employees” for me?

Thanks for your help, as always!

1 Like

As you wish :)!

@de_employees

1 Like

Hi,

The problem is actually already on the table. As you can see in the logs, there are massive delays and dropouts because simply too many commands are sent at the same time. We will not be able to do much about this, it is simply a limitation of the standard. Of course there are ways to fix this.

The easiest would be to reduce the groups.
As an example:
If there are 10 lamps in a group and all of them should be set to 80% warm white, one command will go out. No problem.
If there are 10 lamps in 10 groups, 10 commands go out and there could be a delay or the commands are not implemented at all.
The more groups are to be switched simultaneously, the greater the delay.
It should be noted that a lamp can of course be brought into several groups. This probably makes the reduction easier.

Another way would be to bring in a second cooridinator in the form of another gateway and to divide the devices to be controlled on both. This of course reduces the load. A control via your existing home assistant system is no problem.

Best regards.

I’m not sure how Home Assistant handles it. I can well imagine that you have created groups in the Home Assistant, but not in our app. The assumption is that the HA provides each device individually with a command, which leads to a complete overload.
If you create groups in the Phoscon app, they are also displayed in the HA, but only one group command is sent.

Home assistant exposes both lights and groups from Deconz. It also retries (max 3 retries) it’s action when receiving bridge busy errors from Deconz. Doesn’t deconz retry if the network is saturated?

1 Like

@TimFisher did you resolve this somehow? I have found this several times but have never seen a post about it. my network is large also but not as big as yours whit only 100 lights and about 60 sensors/buttons.
Also found that adaptive lighting seems to be a big culprit.

I may have a big system, but I’d call 160 devices a big one, too!

It doesn’t sound like there is (or can?) be a solution on the table on the deCONZ side, so I’ve made the following changes:

First, I stopped using the HACS “Adaptive Lighting” add-on in HA and instead went with this method based in Node-RED (which I was using for all my automations anyway). It works really well.

Next, I made sure that I did a “check for being off” before I sent any HA commands to turn on a light on. I did the opposite for any commands to turn a light off. This removed any completely unnecessary traffic going out on the Zigbee network.

Then, I routed everything through a rate limiter in Node-RED, ensuring that no more than 1 command per second would ever go out. Now, the worst case scenario, with all 50 of my light groups are on, the color/brightness (which I have updating every 5 minutes) will take a maximum of 50 seconds to update all the lights. The thing I’m trying to figure out now is how to make some messages skip to the front of the line, so to speak. If I’m 10 seconds into a 35 second string of color updates but walk into a room with a sensor that I want to trigger a light-on command, I want that to take precedent. I’m sure I’ll figure it out.

Happy to share a portion of my Node-RED setup if it might be helpful.

EDIT: The only issue now is that I do use the HA app to turn lights on and off sometimes, and that’s not being routed through Node-RED. I have some workaround ideas for this (e.g. creating helpers that pretend to be lights and can send on/off signals to Node-RED where I can then ensure rate limiting occurs) but haven’t fully decided the path forward there.

Thanks, @TimFisher for taking the time to explain.
I have had the condition of being the correct stat before sending on or off in my automation from the beginning. but it is nice that that is part of your “solution”. a lite reassurance is always nice.
if I ever embark on the journey to node-red I might take you up on that. but for now, that is way too big of a task with over 200 automation (ballpark figure)

it sounds really complex doing QOS in node-red but hopefully, you manage to get there, I can definitely see your pain points.

Out of curiosity and it hit me at the time of writing this. do you do power monitoring whit your Zigbee network?
Because that spams a lot!! of messages now that I think about it.
Hmm, this might have all started when I added more plugs whit power measurement.
I will perhaps move those to a separate network to see if there is a difference. I have 9 at the moment.
that will probably take me a while because most of the family is at home whit an ongoing cold :stuck_out_tongue:

will try to post my findings if some one els find this thread :slight_smile:

I do NOT do any power monitoring with Zigbee, so if that’s a busy-making task, I’m avoiding that (thank goodness).

And an update on Node-RED:

I ended up doing a double rate limit, sending the sensor-triggered stuff to the one on the right, giving it a bit of head start against the color updates which are on the left. Works well: a sensor trigger has no more than a 2 second delay at very worst and color updates are “deprioritized” so to speak. Probably a more elegant way to pull it off but it works!

Hi Tim
I don’t think we have to worry about the power monitoring. I don’t know if you were planning on doing that. it seems fine. it’s only related to color and brightness commands :stuck_out_tongue: I moved all of the power monitoring plugs off the conbe2 stick and also some motion sensor that was chatty but nothing changed :slight_smile:
I have stopped the integration and am now doing home assistant automation instead. not as cool as your setup but what I have time for at the moment :slight_smile: Thanks for the update on node-red. perhaps I will get there one day :slight_smile:

1 Like

Hi, all (replying directly to you, Gautama, only so you’d get pinged and could take a peek)! Wanted to resurrect this conversation for everyone with some new thoughts/perspectives.

My large system (351 Hue bulbs and 17 FLS-CT lp strips) with 65 Phoscon groups has been working well for the past 18 months in this new house, aside from a few quirky things here and there like the occasional Hue bulb that needs reset and re-added and a pesky dresden elektronik FLS-CT lp issue I’m dealing with as of late. But generally speaking, especially on the deCONZ/API side (all managed via HA and automated via Node-RED), things have been great.

The way I’ve managed my previously discussed “overloading” issues, for lack of a more technically accurate phrase, is via the 1-second rate limiting monstrosity in Node-RED you can see in my replies from December. This method is completely effective at preventing the error messages, and consequences thereof, you see in my original posts here. One unfortunate side effect, however, of routing a system this big through my home-grown Node-RED rate limiter is that one second between requests means that lights are noticeably, sometimes very noticeably, delayed in their actions.

Between the every-90-second color temperature and brightness updates, motion triggers (via non-Zigbee sensors), and other timers, it can become very apparent that something is “slow” (as my family puts it) although I know I’m doing this purposefully to prevent issues.

After seeing all the amazing bugfixing work that has gone on here since I last attempted to run this system without my own rate limiting, I thought there may have been some improvements that better managed the quantity of requests for my system. So I turned off my rate limiting and spun up the Adaptive Lighting integration in HA, which is my ideal solution to managing color temperature automatically. I was careful to assign only the Phoscon groups that are surfaced in HA as controllable entities so that at absolute maximum, with all lights in the house on, deCONZ would only receive 65 simultaneous requests, maybe one or two others if some triggers happened to fire off at those same times.

Unfortunately, similar errors in the logs quickly appeared and I was suddenly no longer able to control lights via HA or Phoscon. I was forced to stop the Adaptive Lighting routines in HA and to solve the problem I was forced to restart the entire Docker container and also had to leave and re-join the network via the top menu in the deCONZ visual interface.

I don’t know what the ultimate solution is but it doesn’t seem like my own hacky rate limiting is the right way forward. I might be an early adopter to the “all Zigbee bulbs” setup at home, especially my larger home, as well as for my circadian rhythm color temp obsession, but I suspect more of me are around every day. The bulbs get cheaper, sleep quality and work-from-home focus are increasing priorities, and more folks will need these “busier” Zigbee systems to work well.

I’m also not sure if this is an inherent challenge with the Zigbee standard (i.e. networks this big “shouldn’t exist” with the current implementation) or if it’s the way in which deCONZ manages messages or network traffic or whatever deCONZ actually has control of (sorry, not a Zigbee expert!). I’m also not asking if deCONZ can build in API rate limiting or a compensating factor (that wouldn’t get me further than what I have now with my Node-RED rate-limiting solution).

Bottom line is: deCONZ is not doing what I ask it to do (make changes to these x groups). If (and I mean IF) I’m asking it something I shouldn’t be asking it (e.g. too many things at once), it shouldn’t let me ask that but should instead provide some kind of “you can’t do that” error/feedback instead of crashing the system in addition to failing at the specific requests. Otherwise, this feels like a bug or improvement opportunity with deCONZ to support larger/busier systems.

I always appreciate the help from the community here. And I greatly appreciate the work from dresden elektronik and the Phoscon dev team in making this free software available. If there’s no solution, I’ll stick to my hacky rate-limiting and get by, but it feels to me that there’s a better path here that future-proofs this software and I’m happy to be the guinea pig!

If it’s helpful: I am currently using a ConBee II with firmware 0x26780700. I’m running v2.22.2, Qt: 5.9.5, GCC: 7.5.0 (C++ 201402). I’m using Source Routing with a max hops of 5 and a minimum LQI of 150.

Thank you!

Any thoughts on this, deCONZ team?