Delayed and Missing Bulb Updates (deCONZ Struggling to Handle Large Networks?)

Mimiix · August 3, 2023, 5:26pm

manup · August 4, 2023, 1:10am

Hi Tim, for now your rate limiting is still needed. However deCONZ itself is going through a major refactoring so that in future no limiting is needed on the client API side.

The main problem is that commands that are issued through the REST-API, e.g. set color temperature of lights, for the most part are enqueued (if there is enough room in the queue), carried out and hopefully work. This is also what all other systems do, but it’s not the best approach and doesn’t scale well.

Parts of the REST-API plug-in which were rewritten for DDF support already scale better since they always “suspect” things may fail, for example when joining a device the process to query information from a device or configure bindings isn’t carried out in a fire and forget fashion but every step is actually verified and if something failed it will be just retried later on. Configuration of bindings is also periodically verified and reconfigured/repaired if needed.

This approach is also the current and future direction of development on how control and configure commands send by an API client are heading. The gist of it is that when a API client sends turn 300 lights on, that this “target state” is recorded and processed as transaction with verification and recovery retries if something fails.

The mechanism behind this is already used for some sensor devices, here it’s a bit simpler since we address them only as unicast but not in a group.

For controlling single lights the mechanism will be the same, but I like also to extend this approach so it works with groups and scenes. Here it gets trickier, we need to extend the DDFs so that they also specify if a light supports group casts and Zigbee scenes as some don’t or totally buggy. So that for a group call to turn on 300 lights in the first stage a group/scene command is send to the network and then the state is verified either if lights report the new state on their own or it will be queried. For lights which didn’t apply the new state, e.g. didn’t get the group command or don’t support it, the state gets “repaired” by sending unicast commands.

This will be very robust, the API client only sends the new target state and deCONZ/plug-in will carry out the transaction by all means, but there is one caveat especially for large networks: it can take a while, querying 300 lights through a mesh is not fast (Zigbee is hard limited to 250 KBits/sec and in practice even slower), it’s not suitable for changing state every few seconds.

I hope this gives a bit of an overview where we are heading.

TimFisher · August 5, 2023, 1:36am

This is an excellent overview of where you’re headed, and completely answers my question, and after implementation: sounds like it will completely solve my problem as well. Thank you for the time to explain all this!

FWIW: In my large system, the absolute maximum pace I’d be sending messages is to 65 groups at once (super rare) and every 3 to 5 minutes. Sounds like your refactoring will be more than sufficient to handle this appropriately.

Thanks again!

TimFisher · August 22, 2023, 10:33pm

Also, @manup, I’d be happy to beta or alpha test the refactoring. You can DM me if there’s interest in that. I suspect I have a system size that will be more common into the future. Even if not, might be a good pressure test.

TimFisher · November 20, 2023, 1:35am

Hi! Just checking in on the progress here. Any updates on when this refactoring may be complete? Thanks!