Delayed and Missing Bulb Updates (deCONZ Struggling to Handle Large Networks?)

Hi, all (replying directly to you, Gautama, only so you’d get pinged and could take a peek)! Wanted to resurrect this conversation for everyone with some new thoughts/perspectives.

My large system (351 Hue bulbs and 17 FLS-CT lp strips) with 65 Phoscon groups has been working well for the past 18 months in this new house, aside from a few quirky things here and there like the occasional Hue bulb that needs reset and re-added and a pesky dresden elektronik FLS-CT lp issue I’m dealing with as of late. But generally speaking, especially on the deCONZ/API side (all managed via HA and automated via Node-RED), things have been great.

The way I’ve managed my previously discussed “overloading” issues, for lack of a more technically accurate phrase, is via the 1-second rate limiting monstrosity in Node-RED you can see in my replies from December. This method is completely effective at preventing the error messages, and consequences thereof, you see in my original posts here. One unfortunate side effect, however, of routing a system this big through my home-grown Node-RED rate limiter is that one second between requests means that lights are noticeably, sometimes very noticeably, delayed in their actions.

Between the every-90-second color temperature and brightness updates, motion triggers (via non-Zigbee sensors), and other timers, it can become very apparent that something is “slow” (as my family puts it) although I know I’m doing this purposefully to prevent issues.

After seeing all the amazing bugfixing work that has gone on here since I last attempted to run this system without my own rate limiting, I thought there may have been some improvements that better managed the quantity of requests for my system. So I turned off my rate limiting and spun up the Adaptive Lighting integration in HA, which is my ideal solution to managing color temperature automatically. I was careful to assign only the Phoscon groups that are surfaced in HA as controllable entities so that at absolute maximum, with all lights in the house on, deCONZ would only receive 65 simultaneous requests, maybe one or two others if some triggers happened to fire off at those same times.

Unfortunately, similar errors in the logs quickly appeared and I was suddenly no longer able to control lights via HA or Phoscon. I was forced to stop the Adaptive Lighting routines in HA and to solve the problem I was forced to restart the entire Docker container and also had to leave and re-join the network via the top menu in the deCONZ visual interface.

I don’t know what the ultimate solution is but it doesn’t seem like my own hacky rate limiting is the right way forward. I might be an early adopter to the “all Zigbee bulbs” setup at home, especially my larger home, as well as for my circadian rhythm color temp obsession, but I suspect more of me are around every day. The bulbs get cheaper, sleep quality and work-from-home focus are increasing priorities, and more folks will need these “busier” Zigbee systems to work well.

I’m also not sure if this is an inherent challenge with the Zigbee standard (i.e. networks this big “shouldn’t exist” with the current implementation) or if it’s the way in which deCONZ manages messages or network traffic or whatever deCONZ actually has control of (sorry, not a Zigbee expert!). I’m also not asking if deCONZ can build in API rate limiting or a compensating factor (that wouldn’t get me further than what I have now with my Node-RED rate-limiting solution).

Bottom line is: deCONZ is not doing what I ask it to do (make changes to these x groups). If (and I mean IF) I’m asking it something I shouldn’t be asking it (e.g. too many things at once), it shouldn’t let me ask that but should instead provide some kind of “you can’t do that” error/feedback instead of crashing the system in addition to failing at the specific requests. Otherwise, this feels like a bug or improvement opportunity with deCONZ to support larger/busier systems.

I always appreciate the help from the community here. And I greatly appreciate the work from dresden elektronik and the Phoscon dev team in making this free software available. If there’s no solution, I’ll stick to my hacky rate-limiting and get by, but it feels to me that there’s a better path here that future-proofs this software and I’m happy to be the guinea pig!

If it’s helpful: I am currently using a ConBee II with firmware 0x26780700. I’m running v2.22.2, Qt: 5.9.5, GCC: 7.5.0 (C++ 201402). I’m using Source Routing with a max hops of 5 and a minimum LQI of 150.

Thank you!