Modbus TCP no recipient found for response

rob987

Hi,

I am using Mango 3.5.6 modbus tcp to poll a solaredge inverter, and I have bursts of "No recipient" errors. For example on the 26th march, there were 240 errors, over 9 bursts. The times seem to be random, with minutes or hours between errors.

When I first added the inverter I set the poll time to 1 second and the errors were in the thousands, but I have reduced it to the hundreds by extending the poll time to 2 seconds, and the timeout to 1500mS. This seems a long time for an TCP modbus request timeout.

2019/03/22-21:06:35,347 1553252795347 data source started
2019/03/22-21:06:35,389 O 000100000006010300530002
2019/03/22-21:06:36,390 O 000100000006010300530002
2019/03/22-21:06:36,473 I 00010000000701030400000000
2019/03/22-21:06:36,473 O 0002000000060103005d0003
2019/03/22-21:06:36,553 I 00010000000701030400000000
2019/03/22-21:06:37,342 I 00020000000901030600697e020000
2019/03/22-21:06:37,342 O 000300000006010300ce0001
2019/03/22-21:06:37,423 I 000300000005010302fd9b
2019/03/22-21:06:37,423 O 000400000006010300d20001
2019/03/22-21:06:37,685 I 0004000000050103020000
2019/03/22-21:06:37,685 O 000500000006010300e20002
2019/03/22-21:06:38,129 I 0005000000070103040040b993
2019/03/22-21:06:38,129 O 000600000006010300ea0002
2019/03/22-21:06:38,996 I 0006000000070103040043a19f
2019/03/22-21:06:38,997 O 000700000006010300f20001
2019/03/22-21:06:39,400 I 0007000000050103020000
2019/03/22-21:06:41,388 O 000800000006010300530002
2019/03/22-21:06:42,003 I 00080000000701030400000000
2019/03/22-21:06:42,004 O 0009000000060103005d0003
2019/03/22-21:06:42,084 I 00090000000901030600697e020000
2019/03/22-21:06:42,085 O 000a00000006010300ce0001
2019/03/22-21:06:42,629 I 000a00000005010302fd9a
2019/03/22-21:06:42,629 O 000b00000006010300d20001
2019/03/22-21:06:43,356 I 000b000000050103020000
2019/03/22-21:06:43,356 O 000c00000006010300e20002
2019/03/22-21:06:44,356 O 000c00000006010300e20002
2019/03/22-21:06:44,526 I 000c000000070103040040b993
2019/03/22-21:06:44,526 O 000d00000006010300ea0002
2019/03/22-21:06:44,607 I 000c000000070103040040b993
2019/03/22-21:06:44,709 I 000d000000070103040043a1a0
2019/03/22-21:06:44,709 O 000e00000006010300f20001
2019/03/22-21:06:44,810 I 000e000000050103020000
2019/03/22-21:06:45,387 O 000f00000006010300530002
2019/03/22-21:06:46,364 I 000f0000000701030400000000
2019/03/22-21:06:46,364 O 0010000000060103005d0003
2019/03/22-21:06:46,445 I 00100000000901030600697e020000
2019/03/22-21:06:46,445 O 001100000006010300ce0001
2019/03/22-21:06:47,312 I 001100000005010302fd97
2019/03/22-21:06:47,313 O 001200000006010300d20001
2019/03/22-21:06:47,413 I 0012000000050103020000
2019/03/22-21:06:47,414 O 001300000006010300e20002
2019/03/22-21:06:48,261 I 0013000000070103040040b993
2019/03/22-21:06:48,261 O 001400000006010300ea0002
2019/03/22-21:06:49,262 O 001400000006010300ea0002
2019/03/22-21:06:49,351 I 0014000000070103040043a1a1
2019/03/22-21:06:49,351 O 001500000006010300f20001
2019/03/22-21:06:50,352 O 001500000006010300f20001
2019/03/22-21:06:50,945 I 0014000000070103040043a1a1
2019/03/22-21:06:51,048 I 0015000000050103020000

I logged the IO when I first added the inverter, and noticed there are two I or O lines together.

The time between the errors appears to be random

26/3/2019	 5:47:48 AM	'SE5K': com.serotonin.modbus4j.exception.ModbusTransportException: java.net.SocketTimeoutException: connect timed out 
26/3/2019	 9:39:24 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@93a5 
26/3/2019	 9:39:26 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@93a6 
26/3/2019	 10:09:57 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@ac78 
26/3/2019	 10:10:02 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@ac7d 
26/3/2019	 10:10:15 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@ac84 
26/3/2019	 10:10:21 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@ac8c 
26/3/2019	 10:10:28 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@ac92 
26/3/2019	 10:10:34 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@ac96 
26/3/2019	 10:10:53 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@aca1 
26/3/2019	 10:11:01 AM	'SE5K': Exception from modbus master: No recipient was found waiting for response for key com.serotonin.modbus4j.ip.xa.XaWaitingRoomKeyFactory$XaWaitingRoomKey@aca8

Mango is also polling an Automation Direct Click PLC via modbus, and has never reported any errors.
From the system status page

SE5K previous sequential successful polls 	1
SE5K last poll duration 			                        531
SE5K poll success percentage 			        99.24194567277321
ClickPLC previous sequential successful polls 	-1
ClickPLC last poll duration 			               63
ClickPLC poll success percentage 		      100

I take it the poll duration is milliseconds? Mango can ping the inverter in 0.6mS

robert@mango:/opt/mango/logs$ ping se5k
PING SE5K.nixtec.net (10.11.21.253) 56(84) bytes of data.
64 bytes from SE5K.nixtec.net (10.11.21.253): icmp_seq=1 ttl=62 time=0.672 ms
64 bytes from SE5K.nixtec.net (10.11.21.253): icmp_seq=2 ttl=62 time=0.640 ms
64 bytes from SE5K.nixtec.net (10.11.21.253): icmp_seq=3 ttl=62 time=0.628 ms

Only 9 points are being polled in the inverter, (summary from device export)

HOLDING_REGISTER	TWO_BYTE_INT_SIGNED	1	84
HOLDING_REGISTER	TWO_BYTE_INT_UNSIGNED	1	83
HOLDING_REGISTER	TWO_BYTE_INT_SIGNED	1	206
HOLDING_REGISTER	TWO_BYTE_INT_SIGNED	1	210
HOLDING_REGISTER	FOUR_BYTE_INT_UNSIGNED	1	226
HOLDING_REGISTER	FOUR_BYTE_INT_UNSIGNED	1	234
HOLDING_REGISTER	TWO_BYTE_INT_SIGNED	1	242
HOLDING_REGISTER	FOUR_BYTE_INT_UNSIGNED	1	93
HOLDING_REGISTER	TWO_BYTE_INT_UNSIGNED	1	95

So my question is, is the problem / fault with Mango, or the Solaredge inverter?

Thanks

jvaughters

You will have to isolate the inverter from Mango to know for sure. The first thing I would do is leave the configuration as it is and run a constant ping script and log the results. See if the response times on the ping grows during the failure times in Mango. If you are getting ping failures or long responses during the same time as Mango, it is likely the inverter software lagging or network latency issues. The next more drastic isolation would be to use another Modbus client and also the ping test at the same time and see if you get errors as well. Again logging the errors. I've used the original Mango, ScadaBR and now testing the new M2M2 mango for nearly 10 years and most of the time it is not Mango, but the device or network that is the issue.

phildunlap

Hi rob987,

Do you have "Contiguous batches only" checked? It would appear to me you probably do, and that part of the problem may be segmenting the requests for those points into several individual requests, as can be seen in the I/O sample you shared. That's why there's more than one request per second but they're all in the holding registers and only somewhat separated. I suspect disabling that, or increasing the max registers per request, could allow a more efficient polling of the device.

You can see in the timing there that Mango is quite ready to send the next request, but it's waiting for the device to reply.

To the no recipient message, that would be caused by very close timing where the message has been received but just after the timeout happened, which may contribute to its somewhat randomness as to time of occurrence.

jvaughters

Interesting view @phildunlap. One of these days I will have to Wireshark Mango polls and play with those settings to see how Mango tries to optimize. But even if that is checked or not, if the inverter has a good network connection and the software is responsive, it should be able to respond quickly. It is very few points. I'm glad you pointed that out though, something for me to check out one day and it is certainly worth a try on this case.

@rob987 - Is this a wired or wireless connection?

Also, I have found that small resource embedded devices can get clogged up with actions that can slow the response. Can you correlate any inverter actions with the failures?

phildunlap

Interesting view @phildunlap. One of these days I will have to Wireshark Mango polls and play with those settings to see how Mango tries to optimize.

Behold! https://github.com/infiniteautomation/modbus4j/blob/master/src/com/serotonin/modbus4j/BatchRead.java#L180

But even if that is checked or not, if the inverter has a good network connection and the software is responsive, it should be able to respond quickly.

We don't need to speculate. From the I/O log we can see it's taking 90 - 250 ms (and more) to respond to each request.

jvaughters

Awesome, thanks for the code reference. I am just now realizing that log is the Modbus request and responses. I was looking at that log wondering what it was, then it hit me it's Modbus req/rep. How do you get Mango to log that I/O? That could be useful.

jvaughters

So in this case, why did it send out the exact same request in less than a ms? and other times I see it in 1 ms. But why the exact same request in such a short time?

2019/03/22-21:06:43,356 O 000c00000006010300e20002
2019/03/22-21:06:44,356 O 000c00000006010300e20002

phildunlap

Ah, you found the option before I could link and proclaim "Behold!" again. But...

Behold!

2019/03/22-21:06:43,356 O 000c00000006010300e20002

2019/03/22-21:06:44,356 O 000c00000006010300e20002

I see a one second gap. Might that be your timeout?

jvaughters

Nope, behold was needed, I was just using the log above, it is 1 sec gaps, so that is just a retry and makes sense. I missed the increment. So the question is still why is it needing to retry? Something is holding it up. Network or processor?

Thx for the beholds too `,~)

phildunlap

Thx for the beholds too `,~)

Can't hold them in, glad they're appreciated!

I would wager it's the device's ability to respond. It's not impossible it's been taught to briefly ignore connections from a particular host if it thinks it's getting flooded, or even just a max requests per interval from anyone. It could also be that it's pausing for something like a garbage collection if it were a Java application. No ways of knowing really. Probably unlikely it's the network since doing Modbus over the internet is usually not ideal (no vaildation, but a VPN is fine for low security) so it's probably very small packets over LAN, but it's certainly possible it's some wireless card failing to receive or send something. Not knowable offhand.

rob987

Hi,
Thanks for all the input. 247 errors yesterday (27/03/2019) on a cloudy rainy day, so the inverter does not rest :-),
and devote more time to comms.

The inverter is on a wired ethernet connection, with mango and the inverter on the wired LAN.

I had "Contiguous batches only" ticked, as the inverter would not answer back when I first added it. I have now ticked it and reduced the max read registers to 50. The This is giving me 2 requests.

2019/03/28-08:33:28,568 O 00230000000601030053000d
2019/03/28-08:33:28,833 I 00230000001d01031a22e2ffff1384fffe2494ffff91d8fffedac6fffe006aeaf70000
2019/03/28-08:33:28,834 O 002400000006010300ce0025
2019/03/28-08:33:28,914 I 00240000004d01034a033b012300f7011f00000388012c011d01440000fe93ffb7ff73ff6a0000de17db9ee036de71fffe004162a600191f070015cec0001e39760044aaeb002c0006001e72c40005fcb80000

What is the ,nnn after the date-time in the above log (2019/03/28-08:33:28,568). Is it mS timestamp of Java sending and receiving?

Is there any way to have a poll period associated with a point.? The 2nd longer read is for kWh values, and could be read every 15 minutes. This might allow the instant power values to be read more frequently.

I am using mango to read, and publish the values to a Click PLC, as I cannot get the Click to talk to the inverter. I used wireshark on the line, and I see the PLC send 3 SYN requests, and the inverter does not reply. I suspect it may be "to fast" for the inverter.

No errors for 30 minutes, so maybe solved, but not fixed !!!!!!
Thanks

jvaughters

The nnn is ms
The only way I know how to have separate point scan times is to create multiple data sources with different scan times. I will let Mango staff have the last word on that. This could be a problem if the inverter does not allow more than one TCP/IP session.
That "Contiguous Batch Only" definitely changed the request to ask for larger blocks of data. This may be more friendly to the processor. Meaning less requests to handle.

Recommendations to try:

Try to set the Transport Type to TCP with keep-alive if it is not already. I think that should be the default. I have run into issues where performance is affected if that is not set. The constant TCP session set up and tear down is wasted load time.
If the inverter allows more than one TCP/IP session then create a second data source that polls at the 15 min poll rate. If there is another way to do this, maybe the Mango folks will chime in. All settings should be the same for each data source except the points and poll rate.
It's highly unusual, but it is possible that the PLC and the inverter have differing TCP stack software. This can lead to devices not talking. I have seen this before in embedded devices that are old. If this is true you could break down the TCP sections from mango and compare them to the PLC TCP packets. It's not likely, but possible. You could also check every bit sent from Mango and compare to every bit sent from the PLC and see how they differ. It can be grueling and may not be worth the time, but if you really want to know this is an option.

phildunlap

The only way I know how to have separate point scan times is to create multiple data sources with different scan times. I will let Mango staff have the last word on that. This could be a problem if the inverter does not allow more than one TCP/IP session.

:-/ "let"

That's one way. The other way would be a script calling RuntimeManager.refreshDataPoint("DP_XID") (not applicable to that question, but there's also a RuntimeManager.refreshDataSource("DS_XID") ). One could also PUT the REST endpoint /rest/v1/runtime-manager/force-refresh/{xid} or pay someone to mash the refresh swish on the data point details page:

You'd need a logging type other than interval to see this density of polling different longer term.

That "Contiguous Batch Only" definitely changed the request to ask for larger blocks of data. This may be more friendly to the processor. Meaning less requests to handle.

Not having that setting ticked generally leads to larger requests. While it is possible that you can get larger requests with that setting, it requires a contrived register mapping and maximum registers. Contiguous batches only will mean only registers for which there are data points configured will have their values requested. Non-contiguous means it'll try to read registers in between if it's possible without exceeding the max read registers or bits, and the function i linked you to is a greedy method in constructing the read groups.

jvaughters

@phildunlap

Correct, my assumption was he unchecked the "Contiguous Batch Only" and it resulted in two requests for 13 and 37 registers per request as opposed to 2 registers per request based on the Modbus logging, which I dearly love that I/O feature. I'm totally lost on the alternative point polling periods you proposed. And that is ok, because I do not need that feature in Mango and will be sure to contact you if I ever do.

From a general SCADA perspective. Multiple polling rates for different points is handled differently across different systems. Some take over completely and optimize as they see fit and some do that VERY well. Others and my favorite is to create scan groups where you specify the points you want per group and what scan rate you would like for that group. Then the same TCP/IP session can be used by Modbus master to scan the groups as specified. Not sure if Mango would ever consider the scan group idea, but I would certainly like that feature.

jvaughters

@phildunlap

Ah! I do get what you are saying about scripting the update. So you could create a timer and use that to run a script to refresh. So your data source could be set to the longest you wanted for a refresh and you could create a script to refresh at shorter intervals. That actually is a nice work around and, oddly it mimics the scan group feature I discussed. It does lead to a question.

Will the scripted refresh be a single point only, or could you pass it a group of points?
Would that bypass the Modbus attempt to optimize requests?

phildunlap

Will the scripted refresh be a single point only, or could you pass it a group of points?
Would that bypass the Modbus attempt to optimize requests?

it would be single requests for points, good observation! One could construct BatchRead objects manually if there were a need to do so, like this person is doing (but I don't think they're using Mango) https://forum.infiniteautomation.com/topic/2574/how-to-cast-batchresults-getvalue-n-property-to-int-in-modbus-tcp-ip or splitting the points across the different data sources would enable you to refresh the whole data source to poll those points together in the same request. Separating them into different data sources and just using polling rate would probably be the best easy tactic, but if an event would trigger a temporary faster polling time I could certainly see scripting that, like we've discussed here or as in this thread where the data source's poll period is adjusted in an event handler: https://forum.infiniteautomation.com/topic/3288/alter-data-source-update-time-via-event-handler

jvaughters

However, the refresh work around may be the only way to do it for a device that only allows a single TCP/IP session, so that is a good technique to remember and keep in my tool box. Plus, single point MODBUS requests are not the most efficient, but in reality the load increase is very small in most cases. I like it, thx.

rob987

Hi,
Just reporting that no errors in the last 24 hours,
I have the TCP with keepalive set, Comms was very unreliable with a new connection for each request.
Now the forest of errors has disappeared, I notice I get a "Connect timeout" twice a day, about an hour after sunset, and an hour before sunrise. I suspect a quick fix for a leaky system, as the inverter reboots twice a day.

The inverter only accepts one modbus connection at a time, so a second device is not an option.

The instant power value swings very quickly, as clouds move accross the sky, and the fastest update time I can get allows he PLC to adjust the load on 2 x hot water system elements via SSRs. The kWh value is only really used to display "daily usage" of 15 - 30 kWh. It could be read every 15 - 30 minutes.

I have a support request open with solaredge, and have sent then the wireshark logs of a working modpoll connection from a PC, and a PLC connection showing only 3 x SYN request. I will sent them my findings from this, but I will ask for a "stress test" on the next inverter I purchase.

Thanks for all your help

phildunlap

Hi rob987,

Thanks for all your help

Thank you for the very thorough first post!

The kWh value is only really used to display "daily usage" of 15 - 30 kWh. It could be read every 15 - 30 minutes.

Another possibility I forgot to mention would be using a scripting data source to enable, wait, and then disable the points that don't need the faster poll rate, and then speed up the polling on the data source normally.