Lost all devices. needed to power off and on again

Hi,

Just had an issue with all connections lost. I’m running raspbee1 on a Raspberry Pi 3 with deconz 2.11.05 and firmware 26390500.
After a shutdown, disconnecting power, reconnecting again and waiting a gooooood coffee sip long everything works again.

Log is here: 23:15:16:153 APS-DATA.request id: 204, addrmode: 0x03, addr: 0x00212effff01ba49, - Pastebin.com

Screenshot from gui

Took me some hours of sweating, bad looks from wify.

Can I just add another gateway to the zigbee network for resilience?

Regards,
Michael

Hello, the flag “error”’ and “error_L2” was enabled too during your log ?

Yes, they were.
Also the two green lines in the screenshot point to devices which were listed as unavailable in deCONZ webgui.

So I think it was a antenna crash.

I didn’t get any error in the apps, though. Restarting the Raspberry didn’t help.

I did restore the last backup though, before I turned it completely off. May or may not have been neccessary.

In your log i only see communication of the conbee itself. So i am in doubt here whats going on.

Can you add a recent log? Maybe we can spot some signs here?

Sure. here’s a log I just fetched. Remember, it’s working again :wink:

  1. 10:33:20:962 APS-DATA.confirm id: 208, status: 0xE9 MAC_NO_ACK
  2. 10:32:35:783 0x14B457FFFE827EB5 error APSDE-DATA.confirm: 0xA7 on task

Now i see these.

Can you post a screen of the GUI?

deCONZ GUI looks fine, too:

Did you see what you needed in the screenshot?
Shall I provide some more info - feed free to ask.

I’m always up to improve my smarthomes reliability.

Regards,
Michael

I was wondering about the connections in between. However, it seems to be fine.

The devices in the log providing error codes (0xe9, 0xA7), whats their brand/type and how are they placed in the house?

Hmm, I’ll try my best to answer - my deconz log reading skills are lacking here.

I see:
10:33:20:896 APS-DATA.request id: 208, addrmode: 0x03, addr: 0x000d6f0010658d10, profile: 0x0000, cluster: 0x0033, ep: 0x00 → 0x00 queue: 2 len: 2 tx.options 0x00
10:33:20:962 APS-DATA.confirm id: 208, status: 0xE9 MAC_NO_ACK

So, I think the device with IEEE 0x000d6f0010658d10 in deCONZ is associated to that error.
That would be the sunricher light mentioned here:

The specific one is mounted 2m in the same room as the rpi.
Just an info: I’ve got three lights of that type - the others don’t seem to make these errors.

The 0xA7 is mentioned in these lines:

  1. 10:32:35:783 0x14B457FFFE827EB5 error APSDE-DATA.confirm: 0xA7 on task
  2. 10:32:35:784 APS-DATA.confirm id: 198, status: 0xA7 NO_ACK

So I think it’s 14B457FFFE827EB5 - which is a iluminize 511.202. This would be the device the fathest away from the rpi - but it’s a wall mounted switch in the living room with 3 zigbee lamps. Signal Meter says:

Regards,
Michael

Ah okay so that’s odd.

There’s 3 ways for devices to have “shitty” signal:

  • Device being to far away
  • The way the device is placed in the house.
  • The router that this device is connected to, is faulthy or has connection issues which causes it to drop out of the net. This makes the path to the device in question destroyed.

With the second one, It could be that it’s in the wall and behind a metal plate. Metal is bad for signals. That’s why i’m wondering.

Hmm - I cannot outrule faulty completely of course but the others I can. There’s always a router/lamp close and the rest should be zigbee mesh magic.

And - even if these devices just have bad signal, why does that crash the whole system?

Regards,
Michael

Right, but there is some issues with “freezing device” that can broke all the network.

If you can remove a suspect device to make test, it s fast to do, just to make test, not sure it s the guilty.

It would not be bad if you rebuild the network to 25 channel. :+1:

Why? My two WLAN APs run on Wifi 1 and 11 and according to ZigBee and WiFi Coexistence – MetaGeek Support it doesn’t make a difference.

Or do you see something in the logs / something else that I miss?

If it was another device, why didn’t I see anything in the logs - why did it need a complete poweroff (when a restart didn’t help). It seems the conbee was the freezing device.

And about being fast to do - it’s working again - I cannot troubleshoot it until it happens again.

Hmm - anything that I shall try /look at the next time it happens (hopefully it won’t, but if …) ?

What 'bout your neighboring hotspots?

Yes, I m agree with you, I can be wrong, but I don’t see something special in log too.
Except some error on this device

10:33:34:007 0x14B457FFFE827EB5 error APSDE-DATA.confirm: 0xA7 on task

But you are right, connexion issue don’t slow down the entire network generaly.

reconnecting again and waiting a gooooood coffee sip long everything works again.

It mean some devices are not working at all after 1 mn ?

If that device is one of the few routers, it could.

Wow - really? Then I’m questioning zigbee mesh in general.
And should I not see something in the log about that?