Device showing offline

Tanner_Williams · August 12, 2024, 1:51pm

We have a edge device that is now appearing offline(Its only online cause i forced the status online for testing). Were not sure why it went offline but all tempts to reach out to it through losant using edge workflows dont get responses. We also know that device is on and connected to the internet, were able to remote into it and check stuff out.

Dylan_Schuster · August 12, 2024, 2:50pm

If you can SSH into the device, the first thing I would check is the Docker container’s logs to see if there are any hints there.

You can also check your device’s connection logs on its detail page, making sure to go back far enough in time to see what the reason was for the last device disconnection.

And can you elaborate on “all tempts to reach out to it through losant using edge workflows dont get responses”? Are you, for example, pushing a Virtual Button on a workflow that i deployed to the device and not getting a Debug Node attached to it to fire?

I would also force the connection status back to “Disconnected”; we strongly recommend against forcing connection status for any device that actually connects to our MQTT broker as that connection status is used in several behavioral decision paths in our codebase. The connection status should only be forced for “virtual” devices (ones that don’t connect to the broker and get their data from external sources) as a means of identifying inactivity.

Tanner_Williams · August 12, 2024, 3:46pm

Here is the disconnection log for the device, I believe its gone offline a couple of times like this in the past but as you can see like on the 5th its only for a couple of seconds, this most recent time lasted a couple of days before I pushed it back online (which i just turned back off as you asked).

By all tempts have failed, I was trying to run an MQTT response to get SQL and I couldn’t get a response back. This may have been a bad thought process but since the device seemed to be online when we remoted into it I figured forcing online status to check the MQTT might have at least let us communicate with the device. This didn’t work, but it makes me curious to see if its losant or the container that might be causing the issue.

At this moment I dont have access to the docker logs but when I can look into it Ill post that aswell

Tanner_Williams · August 14, 2024, 2:17pm

Here is the disconnect from the 9th, the most recent one that still hasn’t came back online, its just “Attempting reconnect” going on for the next 5 days

Here is another disconnect from the fifth, this one was only down for a couple of seconds, this type of disconnect happens pretty frequently for us but its never an issue as its only down for seconds at a time.

Not sure if this is helpful at all since the disconnect on the 9th has no information with it
@Dylan_Schuster

Dylan_Schuster · August 14, 2024, 3:07pm

If you are able to SSH into device, I would recommend restarting the Docker container (the old turn-it-off-and-turn-it-back-on trick). I would also, if possible, change the logging level to verbose to see if we can get any more info than what is displayed here. Normally we would expect an error message after the reconnection attempt telling us why it failed.

Dylan_Schuster · August 14, 2024, 3:22pm

Something else you can test is whether the container itself has network access …

docker exec CONTAINER_NAME ping google.com

If that fails, then the issue is not with the Losant agent but with your network / hardware setup.

Tanner_Williams · August 14, 2024, 3:39pm

We could try restarting the docker were just concerned if this is some underlying issue, as if we had multiple devices out it wouldn’t be a good solution to have to remote in and reboot them every once and a while.

Here is screenshots from running ping

Dylan_Schuster · August 14, 2024, 4:53pm

Surprisingly, ping is not available in the full GEA image but it is in the Alpine image. You learn something new every day.

Instead, try …

docker exec losant-edge-agent curl https://google.com

Tanner_Williams · August 14, 2024, 6:09pm

Looks successful to me

Dylan_Schuster · August 14, 2024, 6:25pm

OK, so you can reach the internet. Next thing I would try is that again, but hitting our broker URL to ensure that the DNS in the container is resolving correctly:

docker exec losant-edge-agent curl https://broker.losant.com

A successful response will say “Not Found”.

–

If that fails, I would also try reaching broker.losant.com from the host machine (not from inside the Docker container).

Tanner_Williams · August 14, 2024, 7:07pm

Dylan_Schuster · August 14, 2024, 7:53pm

In that case, we’ll need you to change the logging level to “verbose”, try all this again, and see what the container’s logs say.

Tanner_Williams · August 14, 2024, 8:23pm

To change to verbose it says that wed have to restart the container to make it update to the new config file. Would that make us lose the logs about why its having trouble connecting? I guess it would restart and if the error persist then it wouldn’t be able to reconnect again and we could look at that error

Dylan_Schuster · August 14, 2024, 8:38pm

Do we have logs about why it’s having trouble connecting? I thought all we had was “Attempting to reconnect …” over and over. If you have more info than that already, please let us know.

That said, what I’m seeing online is that the old logs will still be present if you just restart the container, as opposed to deleting it and spinning up a new container. But to be safe you could write the output to a file before going through with it if there is anything useful in there.

Once we actually go get back online, you can deploy a workflow with an Agent Config: Set Node in it to change the log level back without having to restart the container.

Tanner_Williams · August 14, 2024, 8:42pm

Alright cool, and no we don’t have anymore information i was just worried if restarting it would make us lose the original cause of the Aug 9th disconnect. Well get on getting the verbose version of the log

emcdee · August 15, 2024, 2:04pm

Thanks for all the tips so far Dylan,
So what I’m seeing is that we turn on verbose and it applies ‘that point forward’, as in no old logs will be verbose. Keep in mind that’s restarting Ubuntu (and container within).
Of course this may still be helpful after a bit of additional monitoring (new logs)… just noting here.

Dylan_Schuster · August 15, 2024, 2:44pm

For future reference - I don’t think it will help in this case since the container won’t connect to the broker - you can view container logs in the Losant UI if you’re using GEA v1.44.0 or later. For those, we maintain a buffer of the most recent messages at each log level - so you could configure the agent’s default logging to be “info” but still have access to at least the most recent “verbose” logs on demand.

Topic		Replies	Views
Edge device/gateway connection reliability, MQTT heartbeat and QoS Help mqtt	6	286	June 10, 2024
DNS resolve or somthing else? Help edge , workflow	1	95	May 21, 2024
Topic inbound throughput limit exceeded on mqtt topic, even when offending workflow removed Help mqtt , edge , device	4	848	July 12, 2021
Edge Agent does not trigger "Device:Startup" after a system reboot Help edge , workflow , device	12	836	August 24, 2020
Losant Edge Agent sending lots of old data - backlog Help	3	607	August 9, 2018

Device showing offline

Related topics