Hi
I am attempting to use the HTTP retriever to 'scrape' a HTML page for a number of values/datapoints. The page contains a table that looks like this, and my objective is to capture all of the numerical values as separate datapoints:
<table>
<td>Gewicht Volk 1</td>
<td><b><p class="right">44.6</p></b></td>
<td>kg</td>
<td>Zu/Abnahme</td>
<td><b><p class="right">44.6</p></b></td>
<td>kg</td></tr>
<tr>
<td>Gewicht Volk 2</td>
<td><b><p class="right">29.4</p></b></td>
<td>kg</td>
<td>ab 00.00 Uhr</td>
<td><b><p class="right">29.4</p></b></td>
<td>kg</td></tr>
<tr>
<td>Luftdruck</td>
<td><b><p class="right">1015</p></b></td>
<td>mbar</td>
</tr>
<td>Temperatur Drucksensor</td>
<td><b><p class="right">1.7</p></b></td>
<td>°C</td>
<tr>
<td>Temperatur</td>
<td><b><p class="right">6.2</p></b></td>
<td>°C</td>
<td>Tagesmin.</td>
<td><b><p class="right">0.0</p></b></td>
<td>°C</td>
<td>, Tagesmax.</td>
<td><b><p class="right">0.0</p></b></td>
<td>°C</td>
</tr>
<td>Brutraumtemperatur</td>
<td><b><p class="right">0.0</p></b></td>
<td>°C</td>
</tr>
<td>Regensensor</td>
<td><b><p class="right">17</p></b></td>
<td>mm</td>
<td>Tagesmenge</td>
<td><b><p class="right">0.0</p></b></td>
<td>mm</td>
</tr>
<td>Luftfeuchtigkeit</td>
<td><b><p class="right">80.6</p></b></td>
<td>%</td>
<tr>
<td>Akku</td>
<td><b><p class="right">12.0</p></b></td>
<td>V</td>
</tr>
<tr>
<td>CSQ (Signalqualität Antenne)</td>
<td><b><p class="right">-1</p></b></td>
<td> </td>
</tr>
If I use a regex along the lines of (?<=<p class="right">)(.*)(?=<\/p)
I can match the first value (44.6
), but I cannot get the second value or nth value by incrementing the 'value capture group' value in the data point properties. Adding a {n}
index to the end of the regex to get the nth match doesn't seem to help either.
If I use a regex like <td>Luftfeuchtigkeit<\/td> <td><b><p class="right">(.*?)<
(the forum has stripped the additional whitespaces) or even (?<=Luftdruck<\/td> <td><b><p class="right">)(.*)(?=<\/p>)
I don't get any matches at all.
Attempts to 'learn' regex have come up short, so I am limited to copying examples from others and messing about by trial and error. Any suggestions about how to accomplish this would be very much appreciated!