Delayed and Missing Bulb Updates (deCONZ Struggling to Handle Large Networks?)

And still - there’s the original question here about the single bulb that keeps showing up in the logs. Strange, no?

Honnestly I don’t know deconz limitation, and if they are from the deconz application or the zigbee network ? I think @manup know that better than me.

And yes, it’s for that I talked about the device connection, strange this one appear so often in logs, you can swap it to make tests ?

I have found values, but IDK why they have been choosed like that

#define MAX_GROUP_SEND_DELAY 5000 // ms between to requests to the same group
#define GROUP_SEND_DELAY 50 // default ms between to requests to the same group

Good find, @Smanar

This raises a good question, I think, for @manup - can you help us understand why these values are set the way they are? It seems… rather restrictive, especially for large systems.

I have hundreds of bulbs, and dozens of deCONZ groups, all rolled up into additional lighting groups in Home Assistant. I have found, through much trial and error, that the “freezing” of the system (which then requires me to leave and re-join the Zigbee network to correct) as well as lights that deCONZ and HA report as having turned on and off when they have not done so, are almost always due to “overloading” (for lack of a beter term!) deCONZ by doing “too much” at once. It can be as simple as turning on or off too many groups at once. I have had to resort to turning off lights by deCONZ light groups one by one which seems to give deCONZ time to accomodate. Turning on or off more than a few groups at once causes the freezing or the mis-reporting of states. It’s very frustrating. It’s made it impossible for me to automate more than a few things at once and also impossible to use circadian rhythm color and brightness updating tools, which of course make changes to many lights at the same time.

Yes, I could create some very complex rate limiting automations, and could decide not to use circadian rhythm tools, but I shouldn’t have to make those compromises. deCONZ should be able to handle an API request to make a change to all 367 bulbs at once (i.e. turning all my lights on or off or updating color and brightness to all lights at once) by doing whatever kind of network-protecting rate-limiting itself.

Any thoughts here, @manup? cc: @Mimiix (maybe you can tag “de_employees” for me?

Thanks for your help, as always!

1 Like

As you wish :)!


1 Like


The problem is actually already on the table. As you can see in the logs, there are massive delays and dropouts because simply too many commands are sent at the same time. We will not be able to do much about this, it is simply a limitation of the standard. Of course there are ways to fix this.

The easiest would be to reduce the groups.
As an example:
If there are 10 lamps in a group and all of them should be set to 80% warm white, one command will go out. No problem.
If there are 10 lamps in 10 groups, 10 commands go out and there could be a delay or the commands are not implemented at all.
The more groups are to be switched simultaneously, the greater the delay.
It should be noted that a lamp can of course be brought into several groups. This probably makes the reduction easier.

Another way would be to bring in a second cooridinator in the form of another gateway and to divide the devices to be controlled on both. This of course reduces the load. A control via your existing home assistant system is no problem.

Best regards.

I’m not sure how Home Assistant handles it. I can well imagine that you have created groups in the Home Assistant, but not in our app. The assumption is that the HA provides each device individually with a command, which leads to a complete overload.
If you create groups in the Phoscon app, they are also displayed in the HA, but only one group command is sent.

Home assistant exposes both lights and groups from Deconz. It also retries (max 3 retries) it’s action when receiving bridge busy errors from Deconz. Doesn’t deconz retry if the network is saturated?

1 Like

@TimFisher did you resolve this somehow? I have found this several times but have never seen a post about it. my network is large also but not as big as yours whit only 100 lights and about 60 sensors/buttons.
Also found that adaptive lighting seems to be a big culprit.

I may have a big system, but I’d call 160 devices a big one, too!

It doesn’t sound like there is (or can?) be a solution on the table on the deCONZ side, so I’ve made the following changes:

First, I stopped using the HACS “Adaptive Lighting” add-on in HA and instead went with this method based in Node-RED (which I was using for all my automations anyway). It works really well.

Next, I made sure that I did a “check for being off” before I sent any HA commands to turn on a light on. I did the opposite for any commands to turn a light off. This removed any completely unnecessary traffic going out on the Zigbee network.

Then, I routed everything through a rate limiter in Node-RED, ensuring that no more than 1 command per second would ever go out. Now, the worst case scenario, with all 50 of my light groups are on, the color/brightness (which I have updating every 5 minutes) will take a maximum of 50 seconds to update all the lights. The thing I’m trying to figure out now is how to make some messages skip to the front of the line, so to speak. If I’m 10 seconds into a 35 second string of color updates but walk into a room with a sensor that I want to trigger a light-on command, I want that to take precedent. I’m sure I’ll figure it out.

Happy to share a portion of my Node-RED setup if it might be helpful.

EDIT: The only issue now is that I do use the HA app to turn lights on and off sometimes, and that’s not being routed through Node-RED. I have some workaround ideas for this (e.g. creating helpers that pretend to be lights and can send on/off signals to Node-RED where I can then ensure rate limiting occurs) but haven’t fully decided the path forward there.

Thanks, @TimFisher for taking the time to explain.
I have had the condition of being the correct stat before sending on or off in my automation from the beginning. but it is nice that that is part of your “solution”. a lite reassurance is always nice.
if I ever embark on the journey to node-red I might take you up on that. but for now, that is way too big of a task with over 200 automation (ballpark figure)

it sounds really complex doing QOS in node-red but hopefully, you manage to get there, I can definitely see your pain points.

Out of curiosity and it hit me at the time of writing this. do you do power monitoring whit your Zigbee network?
Because that spams a lot!! of messages now that I think about it.
Hmm, this might have all started when I added more plugs whit power measurement.
I will perhaps move those to a separate network to see if there is a difference. I have 9 at the moment.
that will probably take me a while because most of the family is at home whit an ongoing cold :stuck_out_tongue:

will try to post my findings if some one els find this thread :slight_smile:

I do NOT do any power monitoring with Zigbee, so if that’s a busy-making task, I’m avoiding that (thank goodness).

And an update on Node-RED:

I ended up doing a double rate limit, sending the sensor-triggered stuff to the one on the right, giving it a bit of head start against the color updates which are on the left. Works well: a sensor trigger has no more than a 2 second delay at very worst and color updates are “deprioritized” so to speak. Probably a more elegant way to pull it off but it works!

Hi Tim
I don’t think we have to worry about the power monitoring. I don’t know if you were planning on doing that. it seems fine. it’s only related to color and brightness commands :stuck_out_tongue: I moved all of the power monitoring plugs off the conbe2 stick and also some motion sensor that was chatty but nothing changed :slight_smile:
I have stopped the integration and am now doing home assistant automation instead. not as cool as your setup but what I have time for at the moment :slight_smile: Thanks for the update on node-red. perhaps I will get there one day :slight_smile:

1 Like

Hi, all (replying directly to you, Gautama, only so you’d get pinged and could take a peek)! Wanted to resurrect this conversation for everyone with some new thoughts/perspectives.

My large system (351 Hue bulbs and 17 FLS-CT lp strips) with 65 Phoscon groups has been working well for the past 18 months in this new house, aside from a few quirky things here and there like the occasional Hue bulb that needs reset and re-added and a pesky dresden elektronik FLS-CT lp issue I’m dealing with as of late. But generally speaking, especially on the deCONZ/API side (all managed via HA and automated via Node-RED), things have been great.

The way I’ve managed my previously discussed “overloading” issues, for lack of a more technically accurate phrase, is via the 1-second rate limiting monstrosity in Node-RED you can see in my replies from December. This method is completely effective at preventing the error messages, and consequences thereof, you see in my original posts here. One unfortunate side effect, however, of routing a system this big through my home-grown Node-RED rate limiter is that one second between requests means that lights are noticeably, sometimes very noticeably, delayed in their actions.

Between the every-90-second color temperature and brightness updates, motion triggers (via non-Zigbee sensors), and other timers, it can become very apparent that something is “slow” (as my family puts it) although I know I’m doing this purposefully to prevent issues.

After seeing all the amazing bugfixing work that has gone on here since I last attempted to run this system without my own rate limiting, I thought there may have been some improvements that better managed the quantity of requests for my system. So I turned off my rate limiting and spun up the Adaptive Lighting integration in HA, which is my ideal solution to managing color temperature automatically. I was careful to assign only the Phoscon groups that are surfaced in HA as controllable entities so that at absolute maximum, with all lights in the house on, deCONZ would only receive 65 simultaneous requests, maybe one or two others if some triggers happened to fire off at those same times.

Unfortunately, similar errors in the logs quickly appeared and I was suddenly no longer able to control lights via HA or Phoscon. I was forced to stop the Adaptive Lighting routines in HA and to solve the problem I was forced to restart the entire Docker container and also had to leave and re-join the network via the top menu in the deCONZ visual interface.

I don’t know what the ultimate solution is but it doesn’t seem like my own hacky rate limiting is the right way forward. I might be an early adopter to the “all Zigbee bulbs” setup at home, especially my larger home, as well as for my circadian rhythm color temp obsession, but I suspect more of me are around every day. The bulbs get cheaper, sleep quality and work-from-home focus are increasing priorities, and more folks will need these “busier” Zigbee systems to work well.

I’m also not sure if this is an inherent challenge with the Zigbee standard (i.e. networks this big “shouldn’t exist” with the current implementation) or if it’s the way in which deCONZ manages messages or network traffic or whatever deCONZ actually has control of (sorry, not a Zigbee expert!). I’m also not asking if deCONZ can build in API rate limiting or a compensating factor (that wouldn’t get me further than what I have now with my Node-RED rate-limiting solution).

Bottom line is: deCONZ is not doing what I ask it to do (make changes to these x groups). If (and I mean IF) I’m asking it something I shouldn’t be asking it (e.g. too many things at once), it shouldn’t let me ask that but should instead provide some kind of “you can’t do that” error/feedback instead of crashing the system in addition to failing at the specific requests. Otherwise, this feels like a bug or improvement opportunity with deCONZ to support larger/busier systems.

I always appreciate the help from the community here. And I greatly appreciate the work from dresden elektronik and the Phoscon dev team in making this free software available. If there’s no solution, I’ll stick to my hacky rate-limiting and get by, but it feels to me that there’s a better path here that future-proofs this software and I’m happy to be the guinea pig!

If it’s helpful: I am currently using a ConBee II with firmware 0x26780700. I’m running v2.22.2, Qt: 5.9.5, GCC: 7.5.0 (C++ 201402). I’m using Source Routing with a max hops of 5 and a minimum LQI of 150.

Thank you!

Any thoughts on this, deCONZ team?


Hi Tim, for now your rate limiting is still needed. However deCONZ itself is going through a major refactoring so that in future no limiting is needed on the client API side.

The main problem is that commands that are issued through the REST-API, e.g. set color temperature of lights, for the most part are enqueued (if there is enough room in the queue), carried out and hopefully work. This is also what all other systems do, but it’s not the best approach and doesn’t scale well.

Parts of the REST-API plug-in which were rewritten for DDF support already scale better since they always “suspect” things may fail, for example when joining a device the process to query information from a device or configure bindings isn’t carried out in a fire and forget fashion but every step is actually verified and if something failed it will be just retried later on. Configuration of bindings is also periodically verified and reconfigured/repaired if needed.

This approach is also the current and future direction of development on how control and configure commands send by an API client are heading. The gist of it is that when a API client sends turn 300 lights on, that this “target state” is recorded and processed as transaction with verification and recovery retries if something fails.

The mechanism behind this is already used for some sensor devices, here it’s a bit simpler since we address them only as unicast but not in a group.

For controlling single lights the mechanism will be the same, but I like also to extend this approach so it works with groups and scenes. Here it gets trickier, we need to extend the DDFs so that they also specify if a light supports group casts and Zigbee scenes as some don’t or totally buggy. So that for a group call to turn on 300 lights in the first stage a group/scene command is send to the network and then the state is verified either if lights report the new state on their own or it will be queried. For lights which didn’t apply the new state, e.g. didn’t get the group command or don’t support it, the state gets “repaired” by sending unicast commands.

This will be very robust, the API client only sends the new target state and deCONZ/plug-in will carry out the transaction by all means, but there is one caveat especially for large networks: it can take a while, querying 300 lights through a mesh is not fast (Zigbee is hard limited to 250 KBits/sec and in practice even slower), it’s not suitable for changing state every few seconds.

I hope this gives a bit of an overview where we are heading.

1 Like

This is an excellent overview of where you’re headed, and completely answers my question, and after implementation: sounds like it will completely solve my problem as well. Thank you for the time to explain all this!

FWIW: In my large system, the absolute maximum pace I’d be sending messages is to 65 groups at once (super rare) and every 3 to 5 minutes. Sounds like your refactoring will be more than sufficient to handle this appropriately.

Thanks again!

Also, @manup, I’d be happy to beta or alpha test the refactoring. You can DM me if there’s interest in that. I suspect I have a system size that will be more common into the future. Even if not, might be a good pressure test. :slight_smile:

1 Like

Hi! Just checking in on the progress here. Any updates on when this refactoring may be complete? Thanks!