We have a edge device that is now appearing offline(Its only online cause i forced the status online for testing). Were not sure why it went offline but all tempts to reach out to it through losant using edge workflows dont get responses. We also know that device is on and connected to the internet, were able to remote into it and check stuff out.
If you can SSH into the device, the first thing I would check is the Docker containerās logs to see if there are any hints there.
You can also check your deviceās connection logs on its detail page, making sure to go back far enough in time to see what the reason was for the last device disconnection.
And can you elaborate on āall tempts to reach out to it through losant using edge workflows dont get responsesā? Are you, for example, pushing a Virtual Button on a workflow that i deployed to the device and not getting a Debug Node attached to it to fire?
I would also force the connection status back to āDisconnectedā; we strongly recommend against forcing connection status for any device that actually connects to our MQTT broker as that connection status is used in several behavioral decision paths in our codebase. The connection status should only be forced for āvirtualā devices (ones that donāt connect to the broker and get their data from external sources) as a means of identifying inactivity.
Here is the disconnection log for the device, I believe its gone offline a couple of times like this in the past but as you can see like on the 5th its only for a couple of seconds, this most recent time lasted a couple of days before I pushed it back online (which i just turned back off as you asked).
By all tempts have failed, I was trying to run an MQTT response to get SQL and I couldnāt get a response back. This may have been a bad thought process but since the device seemed to be online when we remoted into it I figured forcing online status to check the MQTT might have at least let us communicate with the device. This didnāt work, but it makes me curious to see if its losant or the container that might be causing the issue.
At this moment I dont have access to the docker logs but when I can look into it Ill post that aswell
Here is the disconnect from the 9th, the most recent one that still hasnāt came back online, its just āAttempting reconnectā going on for the next 5 days
Here is another disconnect from the fifth, this one was only down for a couple of seconds, this type of disconnect happens pretty frequently for us but its never an issue as its only down for seconds at a time.
Not sure if this is helpful at all since the disconnect on the 9th has no information with it
@Dylan_Schuster
If you are able to SSH into device, I would recommend restarting the Docker container (the old turn-it-off-and-turn-it-back-on trick). I would also, if possible, change the logging level to verbose to see if we can get any more info than what is displayed here. Normally we would expect an error message after the reconnection attempt telling us why it failed.
Something else you can test is whether the container itself has network access ā¦
docker exec CONTAINER_NAME ping google.com
If that fails, then the issue is not with the Losant agent but with your network / hardware setup.
We could try restarting the docker were just concerned if this is some underlying issue, as if we had multiple devices out it wouldnāt be a good solution to have to remote in and reboot them every once and a while.
Here is screenshots from running ping
Surprisingly, ping
is not available in the full GEA image but it is in the Alpine image. You learn something new every day.
Instead, try ā¦
docker exec losant-edge-agent curl https://google.com
OK, so you can reach the internet. Next thing I would try is that again, but hitting our broker URL to ensure that the DNS in the container is resolving correctly:
docker exec losant-edge-agent curl https://broker.losant.com
A successful response will say āNot Foundā.
ā
If that fails, I would also try reaching broker.losant.com from the host machine (not from inside the Docker container).
In that case, weāll need you to change the logging level to āverboseā, try all this again, and see what the containerās logs say.
To change to verbose it says that wed have to restart the container to make it update to the new config file. Would that make us lose the logs about why its having trouble connecting? I guess it would restart and if the error persist then it wouldnāt be able to reconnect again and we could look at that error
Do we have logs about why itās having trouble connecting? I thought all we had was āAttempting to reconnect ā¦ā over and over. If you have more info than that already, please let us know.
That said, what Iām seeing online is that the old logs will still be present if you just restart the container, as opposed to deleting it and spinning up a new container. But to be safe you could write the output to a file before going through with it if there is anything useful in there.
Once we actually go get back online, you can deploy a workflow with an Agent Config: Set Node in it to change the log level back without having to restart the container.
Alright cool, and no we donāt have anymore information i was just worried if restarting it would make us lose the original cause of the Aug 9th disconnect. Well get on getting the verbose version of the log
Thanks for all the tips so far Dylan,
So what Iām seeing is that we turn on verbose and it applies āthat point forwardā, as in no old logs will be verbose. Keep in mind thatās restarting Ubuntu (and container within).
Of course this may still be helpful after a bit of additional monitoring (new logs)ā¦ just noting here.
For future reference - I donāt think it will help in this case since the container wonāt connect to the broker - you can view container logs in the Losant UI if youāre using GEA v1.44.0 or later. For those, we maintain a buffer of the most recent messages at each log level - so you could configure the agentās default logging to be āinfoā but still have access to at least the most recent āverboseā logs on demand.