[Solved] HTML parser node


#1

At Hack Arizona I did a project with IBM Watson Node-RED, displaying local bus schedules on the LCD kit. I used Watson because IBM is a sponsor at the hackathon so I can apply for their prize.
From functionality point of view, IBM Watson is very similar to Losant Workflows, but it’s more complicated and harder to use; their IoT platform is somehow disjoint from workflows, and lacks the realtime MQTT debugging that is available in Losant “application” page.

I find one feature that Node-RED has but Losant Workflow doesn’t: an html node, which parses a string as HTML DOM, and applies a CSS3 selector to extract one or more elements as strings or JSON objects.

This node would be useful to scrape information from a webpage which isn’t otherwise available as a JSON/XML feed. During the development of LCD calories tracker, I attempted to import jQuery into the workflow but it doesn’t work because workflow environment doesn’t have DOM, and importing DOM has a whole lot more dependencies. I ended up using regular expressions and string processing, but that isn’t scalable for increasingly complex webpages.
I hope Losant workflow can have an HTML parser node to simplify such applications.


#2

This is a cool idea - I created a ticket for the feature request. Just for our reference, the source for NodeRED’s HTML node is here:


#3

We added this node in our release about 2 weeks ago (https://www.losant.com/blog/platform-update-20170228), but forgot to reply to this thread. You can check out the documentation for it here.


#4

Does this actually work for importing jquery? It looks like it just finds text inside of an HTML element…


#5

No, this doesn’t import jQuery and would not execute any JavaScript on the webpage. It only allows you to extract information from the DOM already present in the HTML structure.