HI Erin.
I have 2 separate workflows reading 2 families of devices serially.
If a modbus device is not connected to the network when the workflow starts then we get a normal error object in the workflow.
If however the device is connected to the network and the workflow is successfully being read from via MODBUS read, and it is powered off or physically disconnected, this causes the workflow timeout errors. In addition the device that is still connected is not read from.
We then start seeing the same timeout error on the other workflow even though its devices are connected.
I included the REDIS example, as it is also being affected by the timeout error. We use device command was to set a value in REDIS (storage GET/SET is not visible across workflows hence the use of REDIS). What I am seeing once these timeout errors start occurring is the device command is received (late) then we see a timeout error in that workflow and nothing is written to REDIS.
Hopefully this makes things a bit clearer.
Easiest way to replicate - poll two MODBUS devices every 5 seconds. Have the workflow running then physically disconnect one.
I am trying to work out how to restart the edge agent automatically in this scenario.
I think you should have a timeout parameter for MODBUS transactions. 30 secs is too long. We are typically using TCP/Serial converters and their timeouts are set to between 1 and 3 seconds. But if the device is not present, then a 30sec timeout is extreme. If you think about it, in my situation, we typically poll every 3-5 secs, which means a massive backlog of timer triggered runs will develop during the first timeout.
Also it’s odd the normal expected error occurs if the device is down before the workflow runs, but occurs only if the device goes offline once its been running.
Are you holding the TCP connection open between runs. (As as aside if you are for some devices this can be an issue, if concurrent access from multiple masters is required - a few only allow a single socket connection at a time.)
Once we go live I was going to have at least 4 devices being polled in a loop so if 2 go off line suddenly then I will exceed the 60 sec workflow timeout. Which is not good.