Querying 2 tables at once

Hi experts,

I’m hopeful this is an easy one:

Is it possible to query two tables in one go in the workflow editor?

for example, I want to query and see if a deviceid exists in one table but not the other. If it’s not in the 2nd table then I will add it.

I can obviously do this with two separate queries but I’m trying to optimize the query so I’m not doing twice the work.

Thanks,
Nigel

@Nigel_Reed we don’t have a way to run a query against two data tables at once. That said, if you are using the Table: Get Rows Node, the operation is extremely quick.

See the attached application bundle, which you can import as a new application or into an existing application. It serves as a proof of concept for what you are trying to do. It includes:

  • Two data tables, each with a “deviceId” column. Each table has two rows: One that is in both tables and one that is unique to the individual table.
  • One workflow that does the query of both tables and branches into multiple cases depending on if the provided device ID exists in one, both, or neither.

Yes, it is two separate nodes to query each table, but if you check the “Timing” tab in the debug messages generated after pressing the respective Virtual Buttons, you will see this operation is extremely quick.

export-61c37cef08bbae1c3fb94d3e-1640201963808-queryTwoTables.zip (3.6 KB)

Thank you for that. I can’t load this into the production system so I’ll have to see if I can get my hands on staging. I have about 10,000 devices and while what I’m doing works, it does time out. Basically querying the table and getting the results of all the rows, then entering a loop to check the 2nd table, and if it doesn’t exist, add a new row. Just going by assetid. I could probably see about passing less data between the layers which should save some ms. I’m just using deviceId, assetId.

OK, so you’re not trying to do this one row at a time on demand; you’re trying to run more of a batch job where you loop over a large number of table rows at one time and do the comparison for each.

A couple things you can do to speed this up …

  • Make sure your Loop Node is running in “parallel” mode vs. “serial”, as this will run multiple executions at one time.
  • In your query inside the loop, you can set the “Limit” to 1 to return only one row (as opposed to all rows matching the query) since all you care about is the existence of the row and not its contents.
  • Consider breaking this up across multiple workflow runs by scheduling only, say, 1,000 rows per run and doing the next 1,000 rows as a separate run. Read more about that here.

This problem also raises the question of what exactly the use case is here, so if you can share some details on what these two tables represent and why you need to sync them, we may be able to provide an alternative approach.

If you simply must do this kind of advanced comparison between these two large datasets, then Losant’s data tables may not be the best path forward. You could also engage with a third-party service to store and query this data. We have a how-to guide on our blog that may pose useful here.

Thank you for that additional information. I forgot I was going to try parallel rather than serial processing.

We have 10,000+ devices that require updates from time to time. Most of them take the update without any problems however we have problems when devices are offline, under high CPU or we cannot talk to the agent on the device for some reason.

I have a compliance report that I ran on all the devices and store the results, either OK or a list of missing patches. Occasionally devices will be swapped out but they will keep the same name, so they’ll have a different assetId. and I will need to run a new compliance report, which I’m doing via a workflow.

This separate table is basically a current list of patched devices.

Sometimes I’ll manually update the table to report from Missing to OK in which case the tables may get out of sync. This routine will ensure they’re up to date. Sounds a little convoluted but it made sense in my head when I came up with it.

It sounds like you’re pretty deep into your current implementation at this point, but have you considered using device tags to store this information on each device object itself?

For example, you could have the process be something like the following …

  1. When your device boots up, or reconnects to the broker, have it publish its current firmware version on a custom MQTT topic, along with its device ID.
  2. In an application workflow, use an MQTT Trigger configured to fire on messages published on that topic.
  3. That same workflow then uses a Device: Update Node to store the device’s firmware version on a device tag.
  4. Now that the data is on the device object, if you want to generate a list of devices that are behind, you can use an advanced device query (or just use your application’s “Devices” main list itself) to view all devices matching (or not matching) a certain version as stored in a tag.

This is remarkably similar to how Losant’s Gateway Edge Agent works for knowing the GEA version running on an Edge Compute Device. Whenever the agent boots up, it publishes a “hello” message to the cloud broker stating the version it’s running. That value is then stored on the device object - not as a tag but as a first-class property.

Trying to get a new tag added would be like trying to move a mountain, unfortunately, so this isn’t really an option that’s open to me. That’s why I went with the separate table, since I can create those without having to jump through hoops. I appreciate the suggestion through.

Once I used the parallel option, that seemed to allow the process to work and my tables are all synced now.

Thanks,
Nigel