Suggestions about DeCONZ db structure and backup

BabaIsYou · February 2, 2023, 5:01pm

Using DeCONZ with an integration interface in Domoticz (https://github.com/Smanar/Domoticz-deCONZ) for some time now has led me to encounter various types of problems common to this kind of system.

And therefore, by the same token, to be confronted with the process of backup and restoration of my environment (RPi with SD-card).

From there I had to look at how certain inconsistencies would be linked (or not) to the way in which the DeCONZ database (zll.db) is managed.

My “research” has led me to some suggestions/findings that I want to share with you because it could turn into improvements or best understanding for me:

Commit/Roolbak method

The many accesses that are made to the zll.db database are likely to generate errors of tables or database locked with the default process of SQLite to manage rollbacks/commits. Most of the other systems that use SQLite with many concurrent accesses or that store their DB on SD-Card have moved to a newer system called “Write-Ahead Logging” and whose advantages (and disadvantages) are explained here: [https: //www.sqlite.org/wal.html](https://https: //www.sqlite.org/wal.html)
I personally put my zll.db database in this mode for several weeks without side effects by:

$sqlite3 zll.db
SQLite version 3.27.2 2019-02-25 16:06:06
Enter ".help" for usage hints.
sqlite>PRAGMA journal_mode=WAL;
wall
sqlite > .quit

Going back is done just as simply by:

$sqlite3 zll.db
SQLite version 3.27.2 2019-02-25 16:06:06
Enter ".help" for usage hints.
sqlite>PRAGMA journal_mode=DELETE;
wall
sqlite > .quit

I therefore recommend the use of this rollback/commit management rather than the default SQLite one as I encounter less db locked since I change the PRAGMA.

Backup/Restore

But this new way of doing leads to creation of two additional files that are used by SQLite and must be part of the backup process to ensure the integrity of the DB, zzl.db-shm and zll.dd-wal. I went to look at the backup process via the REST-API to make sure this is indeed using SQLite’s API-backup, and that this change is no matter.
Obviously this is not the case, backup.cpp only seems to make a copy of the zll.db file which could cause the file to be copied while a transaction is still in progress and therefore the file could be corrupted (explained in paragraph 1.2 of https://www.sqlite.org/howtocorrupt.html). Regardless of whether WAL is enabled or not, the risk currently exists. I understand that this way of doing things makes it possible to have the same facility on Linux or Windows but with a risk.

I therefore suggest that the database backup principle currently used be aligned with what is recommended by SQLite, i.e. the use of the Backup API . This would also make it possible to have the same process regardless of the type of rollback/commit implemented since it is SQLite that takes care of the integrity of the backup file.
Using the VACUUM INTO command (like sqlite3 zll.db ‘VACUUM main INTO zll-bck.db) would be even better because the backup would be more optimized, but that would be the icing on Sunday!

Sensor/device deletion
Speaking abour DB. I also looked at what is done when deleting a sensor through the REST-API and through the DeCONZ GUI. The GUI causes the deletion of the device and its associated sensors while the REST-API [documentation](https://dresden-elektronik.github.io/deconz-rest-doc/endpoints/ sensors/#delete-sensor) only mentions the deletion of the sensor , but actually that also seems to cause the device to be erased!?

As @Smanar describes in Discord the function deleteSensor(const ApiRequest &req, ApiResponse &rsp) just set sensor->setDeletedState(Sensor::StateDeleted); them later it’s the function DeRestPluginPrivate::saveDb() that remove the entry from the table “devices” and in the table “sensors” but only for this specified sensor. So seems there is another part of code part delete all other sensors entries associated to this device?

By discussing also in Discord on this subject it turns out that the need to be able to erase a complete device must be present (with deletion of the associated sensors), as well as the deletion of only one sensor should also be possible (case mentioned by @Swoop because some Xiaomi devices change endpoints upon firmware update and then people are complaining about non-functioning sensors, for instance).

I therefore suggest to correct the REST-API to only erase the selected sensor by default, and to allow the complete device to be erased by adding an option in the body of the command (in addition to the option {"reset" : true} which should be the default) like {"deletedevice": true}, or to add a device deletion function in the REST-API.

And rather than managing the deletion of the sensors by coding, it seems to me preferable to use (as is already the case for other tables in the DB) an ON DELETE CASCADE constraint between the “devices” table and “sensors”. There are methods to add this constraint even on existing tables in SQLlite (SQLite: Foreign Keys with Cascade Delete). Thus it is the database that would manage the integrity without having to manage the returns of several successive SQL commands and the potential inconsistencies. But that need some correction to the actuel code I think.

Those are just suggestions/comments I wanted to share. Unfortunately, I don’t have sufficient knowledge in coding to propose the modifications in the code by myself but I can propose scripts, or test.

Smanar · February 2, 2023, 5:15pm

And about that I have a question. It was possible on old version to remove only 1 sensors on a device for exemple.
Someone remember why this feature was removed for the new system ?

Swoop · February 25, 2023, 8:21pm

One remark regarding WAL mode. When set, you do not see the related “errors” during an strace anymore in the output.

Previously, it was like this quite often:

fcntl(23, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
stat("/home/swoop/.local/share/dresden-elektronik/deCONZ/zll.db-journal", 0x7ffd804b2bb0) = -1 ENOENT (No such file or directory)
pread64(23, "\0%\221\356\0\0\0\215\0\0\0\211\0\0\0\7", 16, 24) = 16
stat("/home/swoop/.local/share/dresden-elektronik/deCONZ/zll.db-wal", 0x7ffd804b2bb0) = -1 ENOENT (No such file or directory)
fstat(23, {st_mode=S_IFREG|0644, st_size=577536, ...}) = 0

Maybe it’s indeed worth to make is standard?

@BabaIsYou It’s great that you’ve put some time and effort into that. Really like it

BabaIsYou · February 25, 2023, 10:07pm

Thx

I haven’t had any lock errors on the database since I switched to WAL mode. Neither in the logs (but I’m not always in debug mode!), nor when I pass a summary check on the DB twice per day, nor through my searches with DB Browser for SQLite.

It’s quite empirical, I know, but before I often came across lock errors.
FYI, I don’t have a zll.db-journal file, just zll.db-wal and zll.db-shm