Workflow Performance Considerations

I have some high level questions regarding workflow design.
I am hoping to provide context through the simple example of updating device data for multiple devices (many hundreds) from an incoming GCP Pub-Sub message.
This example greatly simplifies our ingestion… in reality, there is quite a bit of pre-processing of the incoming payloads before moving the data into the device.
In our case, the device name (not ID) is identified on the payload.

You could, for example:

  • Have one workflow per device and conditionally populate a given device by matching the device name to a static string in each workflow.
  • Have a single workflow that switches which device update happens based on the name with each device update node configured specifically.
  • Have a single workflow that does a lookup of name vs device ID and then populates the device using a templated device id from that lookup.
    We could, potentially, maintain and device id register in GCP to avoid the lookup, but that requires synchronisation each time there is a change in Losant.
    As device numbers increase, there will come a need to split this into multiple workflows, each handling a number of devices with all but one falling through tom “Default” for a given message.

I’m sure there are other approaches also and I am open to suggestion.
In our case, currently, there are only a small number of devices. Depending on when along the prototyping path the workflows were developed, either option 1 or 2 has been taken.
I am thinking we should swing over to Option 3… it appeals mainly because one workflow means a central and singular place for changes… without need for propagation to many instances.
A question has been fairly raised internally, regarding performance difference between Option 1 and Option 2… in a nutshell…

In regards performance, is there any comparative advantages/disadvantages between Option 1 and Option 3?

What are the upper limits for invocation rates for a workflows?

note: I gathered a lot from the youtube video “Building Performant Workflows That Scale With Your IoT Solution Deeper Dive Webinar”… strongly recommend this to others figuring their way through workflow optimisation.

As well as the specific questions above, any replies with workflow optimisation tips or direction to further related and helpful resources are appreciated.

Option 3 - looking up a single device by name using the Device: Get Node - would be just the slightest bit less performant than using a Conditional Node (one workflow per device) or a Switch Node (one workflow with several cases). We’re talking maybe 20 milliseconds, especially if you are not returning any composite state values when using the node.

I strongly believe Option 3 is the best option for you because of this; the headache of maintaining one workflow per device, or one workflow with an inordinate number of Switch Node statements, does not outweigh any miniscule performance you’d be gaining.

You did mention that there is quite a bit of pre-processing to do before writing to device state, and I’m not seeing that depicted in your screenshots. I can’t imagine that that would take more than 60 seconds (the workflow timeout length) per device.

There is one thing I am not considering here, and that is that you said you are “updating device data for multiple devices (many hundreds) from an incoming GCP Pub-Sub message.” Does this mean that a single message from GCP PubSub contains state updates for several devices? If so, you’ll need to bring a Loop Node into this, at least for each individual state write. I would try to construct the workflow in such a way that the data preprocessing (and possibly also the device lookups) happens outside of the loop; you’d get a huge performance boost by keeping that out of the loop.

Depending on the number of state reports included in a single GCP message, you may have to re-think the whole architecture, though, or risk running into workflow timeouts even with the best performance practices in place. In such a case, I’d recommend writing the messages to a Data Table or to an Application File and kicking off a Resource Job to iterate over each message and write the state, one device per workflow run.

Hi Dylan,

Thanks for your reply. I appreciate the time you put into your response.

Some clarifications:

  • The screen shots were mocked up to target the question without distraction. They are not the actual workflows.
  • The workflows dont run anywhere near 60s… less than 3 s.
  • Each pass through the workflow ingests data for a single device (no need for loop)
  • All of the devices share the same Pub-Sub Topic, however.
  • They are distinguishable by a json path to the device name (not the Losant Device ID, to be clear).

Although, it’s not directly related to this question (see clarifications), the concept of writing whole messages to a table and calling a resource job to process it is an interesting idea I’ll keep in mind .
Thanks for that insight.

My personal sense was as you suggested, that maintainability of Option 3 would out weigh some degree of processing hit.
As a person who will be called on for maintenance, Option 3 is my clear winner.

The key question (in comparing Option 1 vs Option 3) that I am still a bit unclear of is…
Is there any significant difference in having, over a typical update period (nominally, 5 mins):

  • One workflow called hundreds of times, vs
  • Hundreds of workflows once
    ie. is there any load distribution effect in Option 1.

Taking the example a bit further…
Say there were a thousand devices sharing the same workflow and the average run time was 6 seconds…
Over a 5 minute period, on average, there would be a bit over 3 calls per second and 20 concurrent instances at any given time (at verious extents of completion).
Does this loading seem ok to you?

In the worst case they all land at the same second.
Do you expect that the workflow management handle that case, scheduling the instance execution without faltering?

Do you have some upper limit numbers where you might expect this to break down?
What would that look like?

I guess, in reality there would be ways to break down the workflow into smaller load workflows by including conditions like region/area on a tag property of a device as issues presented.
But I am interested in where the calculation load boundaries might be.

Best,
Tim.

From the perspective of Losant’s back-end architecture, one workflow firing 1,000 times per second really isn’t any different from 1,000 workflows firing once per second. We don’t track / throttle concurrency at the workflow level; it’s done at the application level (partitioned by application / experience flow class), and it’s a pretty high limit that our users rarely hit.

Now, we do throttle the debug message output, and one workflow firing at an extremely high rate would run into that limit - but that limit is strictly the rate at which we will send debug message output down to the browser. All flow executions are still taking place.

The only other consideration I can think of when it comes to this question is workflow storage, as that is flow-specific (Flow A cannot read or write to the storage of Flow B, and each individual flow has a limit on the size of its storage values). So if you are utilizing workflow storage and you need those values across different triggers / processes, that would be a reason to combine some flows.

Hi Dylan,

Thank you.
This response definitely helps us resolve our way forward.

Appreciate also the insights regarding debug message throttling (which I have experienced) and workflow storage limits (which I have not… that is a fair whack of data!).

Best,
Tim.