Please Note This forum exists for community support for the Mango product family and the Radix IoT Platform. Although Radix IoT employees participate in this forum from time to time, there is no guarantee of a response to anything posted here, nor can Radix IoT, LLC guarantee the accuracy of any information expressed or conveyed. Specific project questions from customers with active support contracts are asked to send requests to support@radixiot.com.
HTTP retriever - regex matching multiple values
-
Hi
I am attempting to use the HTTP retriever to 'scrape' a HTML page for a number of values/datapoints. The page contains a table that looks like this, and my objective is to capture all of the numerical values as separate datapoints:
<table> <td>Gewicht Volk 1</td> <td><b><p class="right">44.6</p></b></td> <td>kg</td> <td>Zu/Abnahme</td> <td><b><p class="right">44.6</p></b></td> <td>kg</td></tr> <tr> <td>Gewicht Volk 2</td> <td><b><p class="right">29.4</p></b></td> <td>kg</td> <td>ab 00.00 Uhr</td> <td><b><p class="right">29.4</p></b></td> <td>kg</td></tr> <tr> <td>Luftdruck</td> <td><b><p class="right">1015</p></b></td> <td>mbar</td> </tr> <td>Temperatur Drucksensor</td> <td><b><p class="right">1.7</p></b></td> <td>°C</td> <tr> <td>Temperatur</td> <td><b><p class="right">6.2</p></b></td> <td>°C</td> <td>Tagesmin.</td> <td><b><p class="right">0.0</p></b></td> <td>°C</td> <td>, Tagesmax.</td> <td><b><p class="right">0.0</p></b></td> <td>°C</td> </tr> <td>Brutraumtemperatur</td> <td><b><p class="right">0.0</p></b></td> <td>°C</td> </tr> <td>Regensensor</td> <td><b><p class="right">17</p></b></td> <td>mm</td> <td>Tagesmenge</td> <td><b><p class="right">0.0</p></b></td> <td>mm</td> </tr> <td>Luftfeuchtigkeit</td> <td><b><p class="right">80.6</p></b></td> <td>%</td> <tr> <td>Akku</td> <td><b><p class="right">12.0</p></b></td> <td>V</td> </tr> <tr> <td>CSQ (Signalqualität Antenne)</td> <td><b><p class="right">-1</p></b></td> <td> </td> </tr>
If I use a regex along the lines of
(?<=<p class="right">)(.*)(?=<\/p)
I can match the first value (44.6
), but I cannot get the second value or nth value by incrementing the 'value capture group' value in the data point properties. Adding a{n}
index to the end of the regex to get the nth match doesn't seem to help either.If I use a regex like
<td>Luftfeuchtigkeit<\/td> <td><b><p class="right">(.*?)<
(the forum has stripped the additional whitespaces) or even(?<=Luftdruck<\/td> <td><b><p class="right">)(.*)(?=<\/p>)
I don't get any matches at all.Attempts to 'learn' regex have come up short, so I am limited to copying examples from others and messing about by trial and error. Any suggestions about how to accomplish this would be very much appreciated!
-
Hi Jeremy,
Lookahead and lookbehind can definitely get complex (at least you didn't ask a backreference question!). My tactic would be to count how many times "right" appears before what we're interested in,
(?:.|\r|\n)*?(?:right"(?:.|\r|\n)*?){n}right">(\d+\.?\d*)(?:.|\r|\n)*
I'm using
(?:.|\r|\n)*
to match everything, and then we have the{n}
parameter to determine how manyright"
to skip. The only capturing group should be the (n+1)th value, capturing at group index 1.I didn't try this in Mango, but I did test out the regex a little.
-
Of, if you wanted to be explicit about the cities, fearing the order may change or some such,
Luftdruck(?:.|\r|\n)*?(\d+\.?\d*)
-
Brilliant, thanks Phil. Appreciate your explanation. That second regex works perfectly!