LosantPingPong: End-to-end Losant Connectivity Test


#1

I have been battling with the bad WiFi at my apartment since I receive the BuilderKit. Due to packet loss, delay, and multiple layers of NAT on the WiFi, ESP8266 cannot reliably detect the TCP connection to Losant broker is broken.

To solve this problem, I have developed LosantPingPong class that uses periodical end-to-end ping-pong interactions between the device and a workflow, to detect whether the device is still connected to Losant platform, and reconnects WiFi and MQTT if the connection is determined to be down.

https://yoursunny.com/t/2016/LosantPingPong/


#2

The LosantPingPong class is an interesting approach, but I’m not sure if it’s needed. The MQTT protocol does have an underlying ping packet that is implemented in the Losant SDK as a heartbeat. The SDK, but default, will send a ping packet every 15 seconds. It will disconnect with a -4 (timeout) if no message is received. This disconnect is picked up by the higher level code, which then attempts to reconnect.

What you said about never coming back online without a hard reset was an issue with how the kit’s higher level code (non-SDK code) was handling a reconnect - a fix has since been pushed to the repo. There was a possibility that WiFi was re-established, but it could still not connect to Losant. This timed out at a lower level and left the higher level code forever in a loop trying to connect. The fix brings a timeout into the higher level code as well and will cause the board to reset and attempt to reconnect again.


#3

Very clever idea, Yoursunny.

Thanks for the update on the fix Brandon. FWIW, I’ve experienced almost no disconnects this week (last was 5 days ago).

That being said, if anyone would like to be informed when the board has been disconnected for longer than a configurable amount of time (power/internet outage, etc), my “heartbeat” solution for the workflow may work well for you:

The basic logic is that the Heartbeat gets reset every time a message is received by the workflow from the board, or if the board connects. This should happen at least every 15 seconds if all is well. Every minute a separate process runs which decrements 1 from the Heartbeat value. If the value is 0, it sends notifications to phone and email and then turns off further notifications until the unit comes back to life.

In my case I set the heartbeat value to 10 so the device is allowed to be offline for up to 10 minutes before notification is sent.


Monitor Uptime of a Device
#4

Honestly I didn’t know about MQTT ping packet when I designed this.
There’s a lot to learn.

The device always works when I’m playing with it with serial monitor open.
When it gets “deployed”, i.e. connected to a phone charger, it sometimes stops working, and I do not know what’s wrong because there’s no serial monitor.
But I’ll disable this module and see whether MQTT ping alone can detect the network problem and reconnect.