Deconz: is it stable?

After this bug discussion, that is a bug that I’ve been having for quite some months now (it was working fine when I started using deconz) I wrote down some thoughts that I want to share with this community.

I’ve been using deconz for a few years now, and it’s clear to me that these problems (stuff breaking from one version to the other) are going to stay. I don’t think it’s a open source / closed source problem, I think it’s a quality problem: there are no automated tests. Every code change can break anything, and to me this happen often here, and will continue to happen in the future. In software, automated tests are a big investment, and like any investments they require money (= time, here), but also provide quality as return. I don’t think that building a testing infrastructure is something that a random user can come and contribute, but you never know.

There are clearly experienced developers here, but sometimes I find code changes scary. I can see changes on core part of the system without any automated test. Even experienced developers cannot predict all of the consequences of core code changes in complex software systems. And that’s why automated tests exist and help. I think the philosophy here is “do code change, check if it works locally, throw a version out there and see if people complain”. To me this does not scale because you upset your users in the long run.

Take these as superficial comments, I don’t know enough the deconz code base to know if a testing infrastructure would be the best thing to do to increase quality - but I am an experienced developer to know that this is usually the case in software.

As an example, if the problem manup is describing here is confirmed, to me that seems a common type of issue that a simple unit test would have caught. Then devil is in the details, and I don’t know how many changes you’d need to do to make DDF unit testable (…but I would have built them testable from the start).

So, since I had my home automation breaking badly after a deconz update a few times in the past, I changed my strategy from “update deconz all the time” to “update only if I really need to”. Better to have a working home automation that being on the last version for no reason. I personally don’t like it since, as an anxious person :smiley:, I feel I am missing out if I am not on the last version, but also I cannot help to test new changes/features.

There are probably other things that can improve the situation here, but I am concentrating on mentioning the one that I believe has the biggest cost/benefit ratio.

Rant finished :slight_smile:


I give the stability rant to myself every time when working on the code :slight_smile: While automatic tests can’t catch everything they do play an increasingly important role to improve the code and DDFs. I’d like to highlight a few things that were introduced and are being worked on in the last years to improve the “stuff breaks” issues.

Before DDFs the whole device integration part was a big ball of C++ spaghetti code which was only ever extended at random places with zero chance to test anything even if one wanted to. With the introduction of DDFs the C++ parts which needed to be touched to support a device went down to mostly zero. Many don’t remember but before that after changes crashes and segmentation faults were much more present compared to nowadays.

Since also the C++ code is now being cleaned up from old device specific parts, which is still an ongoing process, there is less surface for bugs here. The new way devices are being queried based on DDFs was written in a rather paranoid mode with most focus on what could go wrong and how to deal with that rather than the usual happy path. For example every step in querying and configuring a device is monitored and verified and does recover from errors. At the time this was written the related state machine code was tested automatically to verify that this works also in error cases, see: if you’re interested in such things, there are some docs talking more about the details and reasoning behind this code

Since this was done 3 years ago the system was extended to also improve verification of the device specific DDFs with help of the DDF validator which automatically runs on each GitHub Pull requests via a GitHub Action. It already catches many errors during the PR which before where only detected after a release was running in a users setup. But it’s a work in progress and new tests and verifications are going to be added over time.

Actually automatic tests of the device specific Javascript in DDFs is unfortunately very tricky and often unrealistic. We can do this for some parts which are streamlined e. g. we know when sending turn light on, likely the on attribute will be true afterwards. But most stuff which breaks doesn’t fall in such categories.

For example we had a perfectly working DDF for a device, but than a new firmware version of the same device made this all going south. The fix here was cumbersome to figure out and is still questionable, needless to say for users such things suck.

The recently introduced DDF bundles provide a counter measure (see v2.27.0-beta Release Notes) in the case that an update to a stable DDF introduces a bug, here a new beta DDF bundle is generated automatically. The stable version is still there and can be used / downgraded to, this is not tight to a deCONZ version. Users can also specify to only use stable DDF bundles. It’s also possible to pin a specific version of a DDF bundle to a device, it won’t be upgraded automatically — aka never change a running system (e.g. device).

And only after a while when the bug is discovered and fixed and tested by users, a beta DDF bundle can be marked as actual stable. Internally stable/beta markings are digital signatures embedded in a bundle.

This tech is quite new and needs a much better UI, since some features are currently only reachable via REST API, but I think it will solve many issues which we can’t easily catch by just code checks and automatic testing. The DDF bundles will play a very important role to make device integrations really hard to break.

The biggest elephant in the room is the C++ part which while improving and being rewritten to be more robust and testable is huge, and in the process it’s not really possible to catch every bug before a release. So here the current reasoning is to catch bugs ideally during beta phases before the stable releases are made.

I hope this provides a summary that stability is an very important factor while developing deCONZ not only to fix bugs after the fact but prevent them from happening in first place. Automatic tests are only one of the tools in the toolbox to tame our beast. But I have to be honest unfortunately on the path to more stability there will be bugs and some of them are gonna be truly annoying especially if they don’t occur in developer setups.