Lost 80% of my 342-bulb Setup After Power Outage

TimFisher · June 30, 2022, 4:48pm

I will give that a try after @manup has thoughts on the config.ini. Thank you!

TimFisher · July 3, 2022, 1:50pm

Hey, @manup! Any thoughts on the config.ini or other next-steps? I have the next few days mostly free for troubleshooting.

manup · July 4, 2022, 10:27am

Thanks for config.ini, the counter value of 41.205.450 is still low, the spec says that the counter can go up to 0x80000000 (2.147.483.648) and then may be set back to 0 on request, otherwise it can go up twice as much.

I think this is not the above mentioned problem that the counter did overflow, but I need to read more in the spec too fully understand the whole counter mechanism through the network.

Would you be open to make a live debug session, I’d like to try a few things to see if the unreachable nodes can be brought back into the mix?

Another observation: the zll.db shows that your FLS-CTs are on firmware version 0214.201000EB, which is quite interesting for large networks, for the FLS-PP lp which are based on the same firmware we had recently made some improvements to better deal with large networks and source routing, I’ll compile the FLS-CT version based on that, would be cool if we can update them and then try the source route approach… but first things first. (Having the FLS in the network is pretty nice since they can repeat over larger distances up to 150 meter per hop.)

TimFisher · July 4, 2022, 6:11pm

Let’s do it! I’ll DM you.

Emil · July 6, 2022, 3:35pm

Any findings? =)

TimFisher · July 17, 2022, 3:37pm

Well… the problem went away. And stupid me, I adjusted more than one variable at a time so I’m not sure what happened:

Without testing in between, I updated to 2.17.1 (from the previous version available), cleaned out all my local files and reloaded from a recent backup, reset power to all 342 bulbs (a few breakers), and then immediately upon deCONZ coming online, I connected via VNC and as quickly as I could went after every bulb I could find that was struggling to connect (easy, since it was 80+%!). It was my “I will MAKE you work, damn it!” plan and while I doubt it was my aggressive bulb pinging that did the trick and probably moreso all the rest of the stuff and the order in which I went about it, it DID work.

Every. Single Bulb. came back online.

So this was a deCONZ or docker issue/bug/weirdness/whatever and not an actual bulb issue. So take comfort, those of you that run into similar things - there’s a possibility that you won’t actually have to start from scratch.

TimFisher · July 17, 2022, 3:42pm

For several days after that, I had to leave and re-join the network (via the buttons at the top of deCONZ) at least once a day after noticing that, while none of the bulbs were disappearing like before, the system just wasn’t actually sending the messages to the bulbs (none would turn on or off or change color or brightness even though HA and the deCONZ web interface thought they were doing their job).

Then after several days of that everything just kept working. I’m now 5 days in without having to reset or leave/join. Very weird.

Clearly some large-system stability issues still remain here and/or I need to reset my expectations around how long it takes for the system to get stable with all these connections.

Open to providing any more feedback or data to anyone / @manup that anyone thinks might be helpful for continued development, especially around stability!

manup · July 20, 2022, 12:54pm

I think we should further check the logs and do some live tests when the system is in that state.

The reasons can vary quite much if some devices can’t be reached, from interference to missing or not established routes. Looking forward I guess after more insights we can try the source routing way (but also need to update the FLS-CT’s to get the best routing).

TimFisher · August 7, 2022, 2:17pm

Hey, @manup, I would LOVE to get those FLS-CTs updated. How can I go about doing that?