Lights not responding sporadically (failed-status: APP_BUSY (0x02))

I’m having some trouble using Home Assistant scenes together with deCONZ lights and light groups. I often end up with one or more lights not adjusting correctly when activating a scene. The issue appears to be isolated to activating scenes with multiple lights turning either on or off at exactly the same time. Controlling each light or light group individually works much more reliably.

Below are logs for one occurence where a group of Philips Hue downlights did not turn off when they should have:

In Home Assistant those lights change state to “off” for a few seconds, even though the lights actually stay on. It changes back to “on” a little bit after that and only then can I actually turn them off in a second attempt.

Below is a screenshot of my network in deCONZ.

The lights in question have strong direct links to the ConBee II and generally there is heaps of routers in my network, so I doubt that this issue is related to poor connections.

Is it possible that the deCONZ REST API struggles with multiple requests being sent by HA at the same time when a scene is activated?

Would appreciate any help in debugging / resolving this issue!

Seeing as you write the lights change state momentarily in HA after a call we can deduce that deCONZ does expect the lights to receive the signal, so something is wrong on the zigbee side of things I think. I can’t help more than that, deCONZ devs needs to jump in for further debugging.

22:54:36:170 Erase task req-id: 70, type: 11 zcl seqno: 176 send time 0, profileId: 0x0104, clusterId: 0x0008
22:54:36:170 delay sending request 54 dt 0 ms to 0x001788010BF80DB9, ep: 0x0B cluster: 0x0006 onAir: 1
22:54:36:170 delay sending request 54 ep: 0x0B cluster 0x0006 to 0x001788010bf80db9 onAir 1
22:54:36:171 delay sending request 56 dt 0 ms to 0x001788010BF80DB9, ep: 0x0B cluster: 0x0008 onAir: 1
22:54:36:171 delay sending request 56 ep: 0x0B cluster 0x0008 to 0x001788010bf80db9 onAir 1
22:54:36:171 delay sending request 57 dt 0 ms to 0x001788010BF80DB9, ep: 0x0B cluster: 0x0300 onAir: 1
22:54:36:171 delay sending request 57 ep: 0x0B cluster 0x0300 to 0x001788010bf80db9 onAir 1
22:54:36:171 delay sending request 59 dt 0 ms to 0x60A423FFFED20D03, ep: 0x0B cluster: 0x0006 onAir: 1
22:54:36:171 delay sending request 59 ep: 0x0B cluster 0x0006 to 0x60a423fffed20d03 onAir 1
22:54:36:172 delay sending request 61 dt 0 ms to 0x60A423FFFED20D03, ep: 0x0B cluster: 0x0008 onAir: 1
22:54:36:172 delay sending request 61 ep: 0x0B cluster 0x0008 to 0x60a423fffed20d03 onAir 1
22:54:36:172 delay sending request 62 dt 0 ms to 0x60A423FFFED20D03, ep: 0x0B cluster: 0x0300 onAir: 1
22:54:36:172 delay sending request 62 ep: 0x0B cluster 0x0300 to 0x60a423fffed20d03 onAir 1
22:54:36:172 delay sending request 66 dt 0 ms to 0x60A423FFFE8A38EA, ep: 0x0B cluster: 0x0008 onAir: 1
22:54:36:173 delay sending request 66 ep: 0x0B cluster 0x0008 to 0x60a423fffe8a38ea onAir 1
22:54:36:173 delay sending request 67 dt 0 ms to 0x60A423FFFE8A38EA, ep: 0x0B cluster: 0x0300 onAir: 1

Hello, can you show more details of the scene you are using ? It’s not zigbee scenes ? all lights are in the same group ? Its like all requests are unicast …

The scene is a Home Assistant scene, not a Zigbee scene. It controls 4 deCONZ light groups (containing a total of 17 downlights), 4 individual deCONZ lights and a few other non-Zigbee HA devices / entities.

4 Group + 4 devices don’t seem too much for me.

You haven’t a way to decrease request ?
Because on the sample I have put (Have take it randomly, so can be realy unlucky ^^), we can see 3 zigbee requests on 3 different devices. If you just need to turn it on, don’t need so much requests (there is color and level too)

Using zigbee group is a good way to decrease requests.

But I can be wrong but don’t look so terrific for me, you can’t just add a delay between somes requests ?

When a scene gets activated in Home Assistant all state changes are requested at the same time. That behaviour can’t be controlled by the user, not using scenes anyway.
Would be interesting to find out why this happens with the aim to fix the underlying issue rather than work around it I suppose.

Just by curiosity (I don’t use HA), if you set a scene in phoscon, this one become visible and usable on HA ?
I have see an issue with an user that asking if phoscon scene are zigbee scene or deconz scene, so not sure it’s better.

Can you show “how” is the scene in HA ? to see what is customisable ?

In home assistant a scene is just a bunch of calls to individual lights

I’ve just captured logs for another occurence of what I think may be a related, but not necessarily the same problem. We have one Philips Hue downlight and a Hue motion sensor in our toilet room, and in the below instance the light failed to turn on when the motion sensor detected motion. The motion sensor is not directly linked to the light in deCONZ, but I’m using Node-RED and Home Assistant for additional logic.

I know that the logic in HA and Node-RED is not the problem, as the light changed state to “on” in Home Assistant (as you can see in the screenshot below), but the actual light stayed off - similar to my scene problem described above.

It looks like deCONZ actually reports the light turned on via the websocket, but then there is also an error, which I don’t know how to interpret:

16:39:37:222 Websocket 172.30.32.1:45774 send message: {"e":"changed","id":"47","r":"lights","state":{"alert":null,"bri":255,"colormode":"ct","ct":274,"on":true,"reachable":true},"t":"event","uniqueid":"00:17:88:01:09:b6:17:66-0b"} (ret = 176)
16:39:37:223 Websocket 172.30.32.1:45774 send message: {"e":"changed","id":"38","r":"groups","state":{"all_on":true,"any_on":true},"t":"event"} (ret = 88)
16:39:37:224 0x0000000000000000 error APSDE-DATA.confirm: 0xAE on task

That uniqueid 00:17:88:01:09:b6:17:66-0b is in fact the light that didn’t turn on.

Again, any clues about what might be causing this sporadic problem are highly appreciated.

We can see some “delayed” but I don’t see the same amount of request than on your previous logs

16:39:37:224 0x0000000000000000 error APSDE-DATA.confirm: 0xAE on task

This error mean table full (zigbee table)

An APSME-BIND.request or APSME.ADD-GROUP.request issued when the binding or group tables, respectively, were full.

But I don’t see spamming request on your logs …

If someone with more knowledge have an idea …

I realised that I didn’t capture the APS and APS_L2 logs above. Below is another capture with those included and I hope that can shed a bit more light on what is going on.

In this case a downlight should’ve turned on at 21:51:36:270 but it didn’t.

Below are a few handpicked log entries from the above that look suspicious to me. It seems to be retrying an APS-DATA.request 5 times without success, and we’re seeing status: BUSY and failed-status: APP_BUSY.

21:51:36:272 APS-DATA.request id: 178, status: BUSY (counter: 2)
21:51:36:272 APS-DATA.request id: 178, set state: 0x05
21:51:36:272 0x0000000000000000 error APSDE-DATA.confirm: 0xAE on task
21:51:36:273 APS-DATA.request id: 178, failed-status: APP_BUSY (0x02)
21:51:36:276 aps request id: 175 failed, erase from queue
21:51:36:334 APS-DATA.request id: 183, status: BUSY (counter: 3)
21:51:36:335 APS-DATA.request id: 183, set state: 0x05
21:51:36:335 emit artificial APSDE-DATA.confirm id: 183
21:51:36:335 0x0000000000000000 error APSDE-DATA.confirm: 0xAE on task
21:51:36:337 APS-DATA.request id: 183, failed-status: APP_BUSY (0x02)
21:51:36:356 aps request id: 178 failed, erase from queue
...

Any idea what’s happening here and why? Would the “busy” refer to deCONZ being busy or the light?

@de_employees

Just bumping this. Is there anything in the logs that would indicate where the problem lies? Is it something in my setup or configuration, is it potentially a bug in deCONZ or even faulty lights?

Also, seeing that there is a request being retried and then given up on after 5x, would it be possible to increase the number of retries to better the chances of eventual success?

@Mimiix Sorry to be a pest, but I really need to get to the bottom of this. Is there anything I can do or provide that would help/entice your staff or members of the community to have a look at this and help me debug?

There’s nothing I can do If @de_employees fail to reply.

Small update: The upcoming release v2.16.1 contains a fix which addresses this very issue to prevent the queue being to stressed.

1 Like

Great news, thanks for the update @manup!

I updated to v2.16.1 today and unfortunately it just happen again, lights not turning on when they should but reporting to be on.

Too bad I didn’t have all the debug logs enabled when it happened. However my logs are full of error messages anyway.

I’m seeing a lof of these:

10:49:19:354 0x0000000000000000 error APSDE-DATA.confirm: 0xD2 on task
10:49:19:355 delay sending request 26 dt 1 ms to 0x60A423FFFE8A38EA, ep: 0x0B cluster: 0x0008 onAir: 1
10:49:19:355 delay sending request 27 dt 1 ms to 0x60A423FFFE8A38EA, ep: 0x0B cluster: 0x0300 onAir: 1
10:49:19:355 delay sending request 35 dt 0 ms to 0x001788010BF80DB9, ep: 0x0B cluster: 0x0300 onAir: 1
...
10:52:18:559 0x0000000000000000 error APSDE-DATA.confirm: 0xE1 on task
10:52:18:560 delay sending request 132 dt 0 ms to 0x5C0272FFFE227008, ep: 0x0B cluster: 0x0008 onAir: 1
10:52:18:560 delay sending request 133 dt 0 ms to 0x5C0272FFFE227008, ep: 0x0B cluster: 0x0300 onAir: 1
10:52:18:562 delay sending request 132 dt 0 ms to 0x5C0272FFFE227008, ep: 0x0B cluster: 0x0008 onAir: 1
10:52:18:562 delay sending request 133 dt 0 ms to 0x5C0272FFFE227008, ep: 0x0B cluster: 0x0300 onAir: 1
...

And also a few like this:

17:25:18:567 5 running tasks, wait
17:25:18:572 5 running tasks, wait
17:25:18:578 failed to add task 156560 type: 11, too many tasks
17:25:18:578 failed to add task 156561 type: 6, too many tasks
17:25:18:578 5 running tasks, wait
17:25:18:612 5 running tasks, wait

All hinting at there being too much going on in the Zigbee network. I guess I have a few downlights (~50) and motion sensors (~15), but I need them all :smiley:

Do you guys have any advice on how to address this?

The APP_BUSY message doesn’t show up anymore?

The 0xD2 and 0xE1 are two particular mean errors.

The 0xE1 (MAC channel access failure) if seen multiple times hints on radio interference, this usually happens when no USB extension cable is used or other nearby devices like USB3/SSD or Bluetooth / WiFi jamming the 2.4 GHz channel. For example audio/video streaming or a close by WiFi router can lead to this. In a WiFi scanner app for Android you can check if the occupied WiFi channels collide with Zigbee, but note that Zigbee and Wifi uses very different channel numbers).

The 0xD2 means “broadcast table full” this happens if too many broadcasts/groupcasts are in flight in a short interval.

Having a lot of 0xE1 is never good, as it means not all requests are comming trough.