• Recent
    • Tags
    • Popular
    • Register
    • Login

    Please Note This forum exists for community support for the Mango product family and the Radix IoT Platform. Although Radix IoT employees participate in this forum from time to time, there is no guarantee of a response to anything posted here, nor can Radix IoT, LLC guarantee the accuracy of any information expressed or conveyed. Specific project questions from customers with active support contracts are asked to send requests to support@radixiot.com.

    Radix IoT Website Mango 3 Documentation Website Mango 4 Documentation Website Mango 5 Documentation Website

    HTTP retriever - regex matching multiple values

    User help
    2
    4
    1.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jeremyhJ
      jeremyh
      last edited by jeremyh

      Hi

      I am attempting to use the HTTP retriever to 'scrape' a HTML page for a number of values/datapoints. The page contains a table that looks like this, and my objective is to capture all of the numerical values as separate datapoints:

      <table>
      
      	<td>Gewicht Volk 1</td>
              <td><b><p class="right">44.6</p></b></td>
              <td>kg</td>
              <td>Zu/Abnahme</td>          
                          <td><b><p class="right">44.6</p></b></td>
               <td>kg</td></tr>
      	<tr>
      		<td>Gewicht Volk 2</td>
      		<td><b><p class="right">29.4</p></b></td>
              <td>kg</td>
              <td>ab 00.00 Uhr</td> 
                          <td><b><p class="right">29.4</p></b></td>
              <td>kg</td></tr>
      	<tr>
      		<td>Luftdruck</td>
      		<td><b><p class="right">1015</p></b></td>
              <td>mbar</td>	
      	</tr>
              <td>Temperatur Drucksensor</td> 
              <td><b><p class="right">1.7</p></b></td>
              <td>°C</td>
      	<tr>
              <td>Temperatur</td>
              <td><b><p class="right">6.2</p></b></td>
              <td>°C</td>
              <td>Tagesmin.</td> 
              <td><b><p class="right">0.0</p></b></td>
              <td>°C</td>
              <td>, Tagesmax.</td> 
              <td><b><p class="right">0.0</p></b></td>
              <td>°C</td>
      </tr>
              <td>Brutraumtemperatur</td>
              <td><b><p class="right">0.0</p></b></td>
              <td>°C</td>
      </tr>
              <td>Regensensor</td>
              <td><b><p class="right">17</p></b></td>
              <td>mm</td>   
              <td>Tagesmenge</td> 
              <td><b><p class="right">0.0</p></b></td>
              <td>mm</td>
      </tr>
              <td>Luftfeuchtigkeit</td>
              <td><b><p class="right">80.6</p></b></td>
              <td>%</td>
      	<tr>
              <td>Akku</td>
              <td><b><p class="right">12.0</p></b></td>
              <td>V</td>
          </tr>
           <tr>
              <td>CSQ (Signalqualität Antenne)</td>
              <td><b><p class="right">-1</p></b></td>
              <td> </td>
      	 </tr>
      

      If I use a regex along the lines of (?<=<p class="right">)(.*)(?=<\/p) I can match the first value (44.6), but I cannot get the second value or nth value by incrementing the 'value capture group' value in the data point properties. Adding a {n} index to the end of the regex to get the nth match doesn't seem to help either.

      If I use a regex like <td>Luftfeuchtigkeit<\/td> <td><b><p class="right">(.*?)< (the forum has stripped the additional whitespaces) or even (?<=Luftdruck<\/td> <td><b><p class="right">)(.*)(?=<\/p>) I don't get any matches at all.

      Attempts to 'learn' regex have come up short, so I am limited to copying examples from others and messing about by trial and error. Any suggestions about how to accomplish this would be very much appreciated!

      1 Reply Last reply Reply Quote 0
      • phildunlapP
        phildunlap
        last edited by

        Hi Jeremy,

        Lookahead and lookbehind can definitely get complex (at least you didn't ask a backreference question!). My tactic would be to count how many times "right" appears before what we're interested in,

        (?:.|\r|\n)*?(?:right"(?:.|\r|\n)*?){n}right">(\d+\.?\d*)(?:.|\r|\n)*

        I'm using (?:.|\r|\n)* to match everything, and then we have the {n} parameter to determine how many right" to skip. The only capturing group should be the (n+1)th value, capturing at group index 1.

        I didn't try this in Mango, but I did test out the regex a little.

        1 Reply Last reply Reply Quote 0
        • phildunlapP
          phildunlap
          last edited by

          Of, if you wanted to be explicit about the cities, fearing the order may change or some such,

          Luftdruck(?:.|\r|\n)*?(\d+\.?\d*)

          1 Reply Last reply Reply Quote 0
          • jeremyhJ
            jeremyh
            last edited by

            Brilliant, thanks Phil. Appreciate your explanation. That second regex works perfectly!

            1 Reply Last reply Reply Quote 0
            • First post
              Last post