Extracting multiple data points from an email
-
(using the latest Mango 3 installation on a Linux server)
Hello All,
I am hoping someone can help me with an email data extraction. I understand that the RegEx on the data points will look for a unique sequence and extract the data from that line between the delimiters as shown in the help example. And it works great for email with a single line / data point.I have a logger that sends one email a day and the data is structured as shown here:
The data continues every 10 minutes. So one email contains 144 rows for each day and 7 data points for each row.
Is there a way to extract all the data into Mango 3?
Cheers
Brian
-
Hi BG,
Are you trying to use the POP3 data source?
I would tend to use a Data File Data Source, probably. If the file was being procured over POP3, I would borrow code from the POP3 module to poll the message in a poll class (alas, in a private repository). Then I would use a CSV import class, and just split lines that start with
\d\d/\d\d/\d\d|\d\d:\d\d:\d\d
on the vertical bar and process it into input points. I can expand on this method if you'd like.The quicker, dirtier way would involve a POP3 point that catches the whole message, set to
DO NOT LOG
if the whole email body is large (the reason being that the NoSQL database imposes a maximum string length somewhere in the 65000s (usually, depending on annotation length)), then have that point provide context update to a scripting data source which would again look for lines beginning with the date pattern, split on the vertical bar, and parse it into points on the scripting data source. An easy regex to catch the whole email body might be,([^~]+)
since~
doesn't seem to appear in the text, but\r
and/or\n
seem to. I can expand on this method if you like. -
Hello Phil,
Thank you for your reply.
I was trying with the POP3 data source initially. I think with my level of programming experience the "quicker, dirtier" way sounds better for me. We may get other loggers that have a different format to the email data and if I use the same method fora each I will be able to support them all more effieicentlyPlease expand a bit more on that method.
Cheers
Brian
-
Did you already get the data into an alphanumeric POP3 point? My providing of a script may be aided by having a sample email as text rather than an image.
-
Hello Phil,
I had created the Pop3 Data Source but stopped when I was about to create the data point. I checked the help file for the data points and realised there will be a problem, so I didn't go any further.Here is a paste of the entire body of the email:
+--------------------------------------------------------------------------------------------------------------------------------+ | WS0001 | +----------+----------+------------------+--------------+-------------+------------+---------------+------------+----------------+ | Date | Time | <Wind_Direction> | <Wind_Speed> | <Barometer> | <RH> | <Temperature> | <Dewpoint> | <Sensor_Power> | +----------+----------+------------------+--------------+-------------+------------+---------------+------------+----------------+ | 21/02/19 | 00:00:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 23:50:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 23:40:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 23:30:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 23:20:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 23:10:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 23:00:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 22:50:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 22:40:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 22:30:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 22:20:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 22:10:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 22:00:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 21:50:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 21:40:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 21:30:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 21:20:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 21:10:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 21:00:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 20:50:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 20:40:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 20:30:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 20:20:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 20:10:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 20:00:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 19:50:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 19:40:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 19:30:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 19:20:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 19:10:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 19:00:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 18:50:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 18:40:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 18:30:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 18:20:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 18:10:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 18:00:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 17:50:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 17:40:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 17:30:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 17:20:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | | 20/02/19 | 17:10:00 | 89.000 | 0.050 | 999.500 | 39.000 | 22.490 | 8.040 | 24.000 | +----------+----------+------------------+--------------+-------------+------------+---------------+------------+----------------+
We discovered that the logger does not have to put all the -'s or +'s. They are used to make the email look like a spreadsheet. We can set the email to not use what it calls "Framing"
Thank you for your help with this.
-
What issue did you foresee with the POP3 points? Multiple values per point per email?
Here's the script I came up with,
//JsonEmport.setImportDuringValidation(true); var pointIndexes = this.pointIndexes; var timezone = this.timezone; if(typeof pointIndexes === 'undefined') { pointIndexes = this.pointIndexes = pointIndexes = { windDirection: 2, windSpeed: 3, barometer: 4, rh: 5, temperature: 6, dewPoint: 7, sensorPower: 8 }; //Create points on the scripting data source with those variable names } var lines = p.value.replace(/\r/g, "").split("\n"); for( var k = 0; k < lines.length; k+=1 ) { var line = lines[k]; var dateMatch = /^.*?(\d\d)\/(\d\d)\/(\d\d)\s*\|\s*(\d\d):(\d\d):(\d\d).*/.exec(line); if(dateMatch !== null) { //Assume times are in server timezone, if UTC use next line, else manual offset var date = new Date(parseInt("20" + dateMatch[3]), parseInt(dateMatch[2])-1, parseInt(dateMatch[1]), parseInt(dateMatch[4]), parseInt(dateMatch[5]), parseInt(dateMatch[6])); // var utcDate = new Date(Date.UTC(parseInt("20" + dateMatch[3]), parseInt(dateMatch[2])-1, parseInt(dateMatch[1]), // parseInt(dateMatch[4]), parseInt(dateMatch[5]), parseInt(dateMatch[6]))) var lineData = line.split("|"); for(var point in pointIndexes) { var nextValue = lineData[pointIndexes[point]]; nextValue = nextValue.replace(/\s+/g, ""); if(typeof this[point] === 'undefined') { //Create a data point using our base point, var newDp = JSON.parse(JsonEmport.dataPointQuery("eq(xid,DP_EmailParser_BasePoint)")).dataPoints[0]; newDp.name = point; newDp.pointLocator.varName = point; newDp.enabled = true; delete newDp.xid; //get a randomly generated xid JsonEmport.doImportGetStatus(JSON.stringify({"dataPoints":[newDp]})); RuntimeManager.sleep(1000); } this[point].set(parseFloat(nextValue), date.getTime()); } } }
Where for a similar email you would only need to change the pointIndexes object to be the names you want the points in those columns to be created with. I did it with the formatting still in the email (just add
\s*
all around some regex!) Here's the JSON for the data source with the base point:{ "dataSources":[ { "xid":"DS_8ffb2983-dd50-42e1-9b43-706bc0a10eb2", "name":"Email Parser", "enabled":false, "type":"SCRIPTING", "alarmLevels":{ "SCRIPT_ERROR":"URGENT", "DATA_TYPE_ERROR":"URGENT", "POLL_ABORTED":"URGENT", "LOG_ERROR":"URGENT" }, "purgeType":"YEARS", "updateEvent":"CONTEXT_UPDATE", "context":[ { "varName":"p", "dataPointXid":"DP_f4a4b495-f7a8-4ddc-a619-5172835ebe30", "updateContext":true } ], "logLevel":"NONE", "cronPattern":"", "executionDelaySeconds":0, "historicalSetting":false, "logCount":5, "logSize":1.0, "script":"\/\/JsonEmport.setImportDuringValidation(true);\nvar pointIndexes = this.pointIndexes;\nvar timezone = this.timezone;\nif(typeof pointIndexes === 'undefined') {\n pointIndexes = this.pointIndexes = pointIndexes = {\n windDirection: 2,\n windSpeed: 3,\n barometer: 4,\n rh: 5,\n temperature: 6,\n dewPoint: 7,\n sensorPower: 8\n }; \/\/Create points on the scripting data source with those variable names\n}\n\nvar lines = p.value.replace(\/\\r\/g, \"\").split(\"\\n\");\nfor( var k = 0; k < lines.length; k+=1 ) {\n var line = lines[k];\n var dateMatch = \/^.*?(\\d\\d)\\\/(\\d\\d)\\\/(\\d\\d)\\s*\\|\\s*(\\d\\d):(\\d\\d):(\\d\\d).*\/.exec(line);\n if(dateMatch !== null) {\n \/\/Assume times are in server timezone, if UTC use next line, else manual offset\n var date = new Date(parseInt(\"20\" + dateMatch[3]), parseInt(dateMatch[2])-1, parseInt(dateMatch[1]),\n parseInt(dateMatch[4]), parseInt(dateMatch[5]), parseInt(dateMatch[6]));\n\/\/ var utcDate = new Date(Date.UTC(parseInt(\"20\" + dateMatch[3]), parseInt(dateMatch[2])-1, parseInt(dateMatch[1]),\n\/\/ parseInt(dateMatch[4]), parseInt(dateMatch[5]), parseInt(dateMatch[6])))\n var lineData = line.split(\"|\");\n for(var point in pointIndexes) {\n var nextValue = lineData[pointIndexes[point]];\n nextValue = nextValue.replace(\/\\s+\/g, \"\");\n if(typeof this[point] === 'undefined') {\n \/\/Create a data point using our base point,\n var newDp = JSON.parse(JsonEmport.dataPointQuery(\"eq(xid,DP_EmailParser_BasePoint)\")).dataPoints[0];\n newDp.name = point;\n newDp.pointLocator.varName = point;\n newDp.enabled = true;\n delete newDp.xid; \/\/get a randomly generated xid\n JsonEmport.doImportGetStatus(JSON.stringify({\"dataPoints\":[newDp]}));\n RuntimeManager.sleep(1000);\n }\n this[point].set(parseFloat(nextValue), date.getTime());\n }\n }\n}", "scriptPermissions":{ "customPermissions":"", "dataPointReadPermissions":"superadmin", "dataPointSetPermissions":"superadmin", "dataSourcePermissions":"superadmin" }, "editPermission":"", "purgeOverride":false, "purgePeriod":1 } ], "dataPoints":[ { "xid":"DP_EmailParser_BasePoint", "name":"BasePoint", "enabled":false, "loggingType":"ALL", "intervalLoggingPeriodType":"MINUTES", "intervalLoggingType":"AVERAGE", "purgeType":"YEARS", "pointLocator":{ "dataType":"NUMERIC", "contextUpdate":false, "settable":true, "varName":"basePoint" }, "eventDetectors":[ ], "plotType":"SPLINE", "rollup":"NONE", "unit":"", "simplifyType":"NONE", "chartColour":"", "chartRenderer":{ "type":"IMAGE", "timePeriodType":"DAYS", "numberOfPeriods":1 }, "dataSourceXid":"DS_8ffb2983-dd50-42e1-9b43-706bc0a10eb2", "defaultCacheSize":1, "deviceName":"Email Parser", "discardExtremeValues":false, "discardHighLimit":1.7976931348623157E308, "discardLowLimit":-1.7976931348623157E308, "intervalLoggingPeriod":1, "intervalLoggingSampleWindowSize":0, "overrideIntervalLoggingSamples":false, "preventSetExtremeValues":false, "purgeOverride":false, "purgePeriod":1, "readPermission":"", "setExtremeHighLimit":1.7976931348623157E308, "setExtremeLowLimit":-1.7976931348623157E308, "setPermission":"", "tags":{ }, "textRenderer":{ "type":"ANALOG", "useUnitAsSuffix":true, "unit":"", "renderedUnit":"", "format":"0.000" }, "tolerance":0.0 } ] }