[Solved] Function: extract data from HTML response

my test device response with a very basic HTML response which ends up in the data object as

"time":Tue Aug 29 2017 11:33:07 GMT+1000
"data":{} 4 keys
"body":"<!DOCTYPE html><html><head><link rel='icon' type='image/png' href='http://nodemcu.com/favicon.png' /></head><body><h1>Hello!</h1><p>Since the start of the server 877 connections were made</p><p>257128 seconds elapsed</p>DHT Temperature:17.500;Humidity:47.300\r\n</body></html>"
"headers":{} 2 keys

I am thinking it should be possible to run a Function on this to pull the numeric temperature and humidity values with javascript regexp. problem is I cannot work out how to access the body string above in the Function block

var r=/Temperature/i
payload.test=r.test(payload.data)

generates an error “Cannot convert object to primitive value”

payload.test=r.test(payload.data.body) does not throw an error but does not work (ie regex test fails) so i assume its not referencing the body text and returning an empty string.

thx
J

Looking at the payload you pasted, it looks like the field you want to search is located at payload.body. Instead of a regular expression, here’s a function a put together that uses some String functions:

var body = payload.body;
var headerKey = "Temperature:";
var headerIdx = body.indexOf(headerKey) + headerKey.length; // 236
var temperature = body.substring(headerIdx, body.indexOf(";", headerIdx)); // '17.500'
payload.data.temperature = parseFloat(temperature);

This function first finds the index of “Temperature:”. It adds the length of the string to move the index from the beginning of the search term to the end. It then finds the index of the first semicolon after the previously found index. The characters in the middle of those two indices are the value you want.

If you’re able to change the HTML format at all, wrapping the values you need in elements would allow you to use our HTML / XML Parser Node to easily extract values. For example, if your HTML was this:

<!DOCTYPE html>
<html>
  <head>
    <link rel='icon' type='image/png' href='http://nodemcu.com/favicon.png' />
  </head>
  <body>
    <h1>Hello!</h1>
    <p>Since the start of the server 877 connections were made</p>
    <p>257128 seconds elapsed</p>
    DHT Temperature:<span id="temp">17.500</span>;Humidity:<span id="humidity">47.300</span>
  </body>
</html>

Notice the span tags around the relevant values with the IDs “temp” and “humidity”. The selector in the HTML Parser Node would then just be “#temp”, which will return just that value.

Brandon, thanks! I now have it working. Agree it makes sense to fix up the device HTML to emit span tags so the HTML parser can be used. Final fix is;

1/ Device is polled and returns HTML which is placed in payload.data. The HTML contains a body tag and the data values are plain text within, nothing has tags

2/ Use a Function block to extract the data, the code below works

var body=payload.data.body;
var re=/ature:(\d{2}.\d).+dity:(\d{2}.\d)/i; //this is a regular expression with two capture groups

var m = re.exec(body); //regex will capture the digits immediately after words Temperature: and Humidity:
payload.data.Temperature = parseFloat(m[1]); //m[0] is entire match, m[1] is the first group
payload.data.Humidity = parseFloat(m[2]);

3/ Use a Device State block to capture Temperature = {{data.Temperature}} and Humidity = {{data.Humidity}}

4/ create a new Dashboard and set up a time series plot

the Losant walkthrough guide is very helpful here.

1 Like

Looks really good! Glad it’s up and running!