eWon Flexy 205 all tag values 0 on reboot or seemingly randomly

howitzer · February 1, 2023, 1:07pm

I’ve been experiencing an issue for some time now that I kept hoping would resolve itself or that it was something I was doing. But now it is just plain annoying and I have not found a satisfactory answer why yet. I did a cursory search of the forum and found a couple threads that seem to have the same issue as me, but not a real good handle on how to permanently fix it.

The issue is that I am using the eWon Flexy 205 to read tags from two separate OPC-UA Servers, and a third PLC by ISO-TCP. I have some alarming on these tags, etc. so it can alert me if there is a problem. The issue is if the Flexy reboots, all values are set to 0 and the status is set to Unknown (the broken heart icon). The only reliable way I have found is to edit each tag and click Update. That takes a lot of time, I have 20-30 tags total. I have tried disabling the OPC-UA topics and waiting and re-enabling the OPC-UA topics but that does not reliably work. And it does not work at all for the ISO-TCP tags that I am reading, as I am not using a Topic.

I found the thread that gives a BASIC script to check the status tag, and trigger a Re-init of the OPC-UA client, but I also seem to have issues where BASIC scripts stop running for whatever reason and trigger a restart. Which sets all the tags back to 0, even MEM tags. And then not even alarming will work, so how will BASIC script work then. Because all values are Unknown health (broken heart). It is getting really annoying to have to check this many times a day just in case. I have disabled the BASIC scripting and deleted any extra value tags I don’t need, etc.

I have seen in the Event Logs a fair amount of opcuaiosrv-OPCUA service error (BadTypeMismatch) errors. Which I don’t understand, the tags are good on the servers.

I have attached screenshot of the Values page after a reboot. Or as I found today, the last reboot was 3 days ago, and the values were working then, but today I checked and no reboots but all values are 0 and unhealthy (broken heart icon). This is really frustrating to not know what is causing this. I have attached a backup with support files tar-ball file. Can you offer any advice?
MOVED TO STAFF NOTE (453 KB)

kyle_HMS · February 6, 2023, 5:28pm

Hi @howitzer,

This is definitely not “normal” and we will have to investigate what is causing the problem. If the Flexy reboots, it should automatically reconnect to the IO Servers.

Looking at the logs it appears that you are using a Fallback WAN configuration. The primary WAN is Ethernet and the Fallback is cellular. Can you confirm this?

There are also logs showing an IP conflict happening. Can you explain the WAN setup? For example, what is the WAN subnet that you are connecting to? What is the gateway and netmask (not the setting in the Flexy, but the actual network)?

howitzer · February 7, 2023, 3:52pm

Kyle-

That is what I was expecting, that after some time to get the ewon box back up, it would reach out again. The server is still configured to have the tags available, etc.

Yes, I am using primary WAN Ethernet (spectrum cable modem), and a AT&T 4G LTE cellular fallback.

I am not sure the reason of the IP Conflict, other than we were trying to troubleshoot our network. Our Access Points only have one DHCP Server possible, and we have an internal network for the automation system which is 191.168.72.0/21. I know its not standard, but its a result of how we outgrew our existing network and we didn’t want to change a ton of IP addresses. So this is what we ended up keeping. Anyway, our AP’s have two SSID’s, each on its own VLAN. One VLAN is for the internal automation network, and the other is for the Internet access. Due to having to share a DHCP Server (one is allowed per AP), the IP’s it hands out have to work on both sides. The VLAN keeps the segregation.

Anyway, we were trying to put our iPad on the internet SSID, and use eCatcher to get into the LAN (demonstrating eWon and eCatcher to new employees). Well that is when I started running into IP Conflicts, etc. Which makes total sense. So we can’t do that here. If I had a seperate DHCP Server for each SSID, then it wouldn’t be an issue. But these are industrial Wifi AP’s, that apparently don’t allow for this. We have stopped doing this now, but still all the devices get an IP in this range, and we still have the VLAN keep the segregation.

The eWon WAN is having an IP outside this subnet, which it gets from a Firewalla Gold firewall box, which itself connects to the Spectrum cable modem. We set up a 3rd VLAN on our network to send this over the 1G fiber line to the managed switch and then to the eWon WAN port.

I also did some reconfiguration with the IO Servers-> Advanced Parameters-> Default TCP RX/TX Timeout, I changed it to 10000 mSec. And left unchecked the Disable Tags in Error. This has helped with some of the errors I was getting. Just now, the eWon rebooted itself due to unknown reason. And once again the tag values are all unknown. Even after several minutes. Per the forums suggestion, I just did an Init on the two IO Servers I am using, and am waiting for the tag values to resume good status. The Real Time logs show the OPCUA disconnect, then connecting, then connected. But the tag values are not ok still. And even the S7300&400 IO server, which uses the ISO-On-TCP, that is not updating either.

I do have a new backup I can send if needed?

07/02/2023 09:25:22 -20205
muting (pattern of 2 events)
ftps
07/02/2023 09:23:58 -20205
muting (pattern of 3 events)
opcuacom
07/02/2023 09:23:58 -20205
pattern of 1 event muted 4 times
config_writer
07/02/2023 09:23:52 -20205
muting (pattern of 1 event)
opcuaiosrv
07/02/2023 09:23:52 -38320
opcuaiosrv-Waiting session disconnection
opcuaiosrv
07/02/2023 09:23:52 -38320
opcuaiosrv-Waiting session disconnection
opcuaiosrv
07/02/2023 09:18:32 -38320
opcuaiosrv-Waiting session disconnection
opcuaiosrv
07/02/2023 09:18:31 -38308
opcuaiosrv-OPCUA service error (BadTimeout)
opcuacom
07/02/2023 09:18:30 -31130
wanmgt-WAN connection request failed: DHCP
wanmgt
07/02/2023 09:18:09 -22602
System Booting, FWR: 14.6s0 (14.6), SN: 1804-0031-24 [EF0000]
elog
06/02/2023 04:42:52 -31166
wanmgt-WAN Fallback – Maximum duration reached using fallback interface
wanmgt
06/02/2023 03:42:51 -31167
wanmgt-WAN Fallback – Switching to fallback interface
wanmgt
06/02/2023 03:42:51 -31163
wanmgt-WAN Fallback – Could not access test servers using primary connection ((03/03) – tcp://device.api.talk2m.com:443)
wanmgt
06/02/2023 03:41:51 -31163
wanmgt-WAN Fallback – Could not access test servers using primary connection ((02/03) – tcp://device.api.talk2m.com:443)
wanmgt
06/02/2023 03:40:51 -31163
wanmgt-WAN Fallback – Could not access test servers using primary connection ((01/03) – tcp://device.api.talk2m.com:443)
wanmgt
06/02/2023 03:25:31 -31166
wanmgt-WAN Fallback – Maximum duration reached using fallback interface
wanmgt
06/02/2023 02:25:30 -31167
wanmgt-WAN Fallback – Switching to fallback interface
wanmgt
06/02/2023 02:25:30 -31163
wanmgt-WAN Fallback – Could not access test servers using primary connection ((03/03) – tcp://device.api.talk2m.com:443)
wanmgt
06/02/2023 02:24:30 -31163
wanmgt-WAN Fallback – Could not access test servers using primary connection ((02/03) – tcp://device.api.talk2m.com:443)
wanmgt
06/02/2023 02:23:29 -31163
wanmgt-WAN Fallback – Could not access test servers using primary connection ((01/03) – tcp://device.api.talk2m.com:443)

kyle_HMS · February 7, 2023, 6:20pm

Yes, please send the new backup. I think there is definitely a routing issue going on. From the logs you just sent we can see it switching between the primary and fallback internet connections and having problems accessing the internet.

I also noticed that there is code in the cyclic section of the BASIC IDE. It appears to be commented out, but even so, I would either remove it or move it to the init section.

It may help to take some packet captures from the Ewon. Are you in the U.S.?

howitzer · February 7, 2023, 8:36pm

Network architecture:
please keep private, do not post to public forum.

howitzer · February 7, 2023, 8:36pm

BASIC script here.

howitzer · February 7, 2023, 8:36pm

I have attached here the latest backup. I also do have code in the BASIC script, that has components in both the Cyclic and Init sections. It was the only way I was able to get it to behave the way I wanted. Which was every 5 seconds it saves one tag to another. So the timer is in the Init section, and the function it calls is in the cyclic section. Yes I am in the USA.

MOVED TO STAFF NOTE (457 KB)

kyle_HMS · February 7, 2023, 9:50pm

Thank you for the info.

First, you should really move that code out of the cyclic section. In a recent firmware update, we have removed the cyclic section because it was causing so many problems. It can lead to unpredictable behavior in the Ewon. You should be able to use ONTIMER instead. There is a reboot at 09:18:10 today for not enough memory, which could have been caused by the code in the cyclic section.

I think until the code is fixed, it doesn’t make a lot of sense to speculate about the issue because it could be the sole reason for the problems. I recommend getting that fixed and then try testing again. What problem are you running into when you put the code in the Init section?

howitzer · February 8, 2023, 6:31pm

I had been having issue with the timer not running, i.e. the tags updated constantly. But now I just tried it all in Init section and it worked.

Anyway, I removed all BASIC code for now (not just commented out, actually deleted). I did a few reboots, and still the Tag Values table is remaining in the Unknown state. Even after I do several Init requests on the two IOServers I am using. This BASIC code just moves one tag to another, it is for a heartbeat I was using so I could tell if the eWon was down. So it is not critical.

At this point, what about doing a factory reset on my box? Reset it all to defaults and reconfigure everything again. Could that help? I have attached a new backup here now.

Also, I see in the logs it has entries where it cant contact a talk2m server, or NTP sync fails. My firewall is not blocking anything on this subnet. Indeed it is set to allow all traffic for now.

MOVED TO STAFF NOTE (454.5 KB)

kyle_HMS · February 8, 2023, 6:35pm

Hi @howitzer,

I think a factory reset is a great idea. Do you know how to do it? (Basically just holding the reset button while the unit powers up for about 40 seconds until the USR LED flashes then turns solid red)

Were you aware that there is Java code running on the device? This certainly could be causing issues, depending on what is in it, and if you aren’t using it, it should be removed. A factory reset will take care of it.

howitzer · February 8, 2023, 9:05pm

I had tried a Java MQTT solution once, but kept having issues. So deleted the code. Or so I thought!

Anyway, I just tried unsucessfully for the last hour to perform level 2 reset, and it just will not comply. I am following the directions. I never get the USR to blink red on the auto test step, it blinks green. The pattern that I seem to see is 1 or 2 times 1-second ON, then it returns to forever 200 ms ON and then off. I tried to count and lost count up near 100.

In fact, when I do the Reset button press, after 10 seconds I get the USR Led to be steady red. I never see it blink, like for level 1 reset. If I keep pressing it, then eventually it goes out, and nothing is ever done.

No matter what I do, the device just won’t reset. The IP stays the same, tag values still there, etc. I even tried eBuddy doing a Recovery to the current firmware 14.6s0, and I get the same results. No reset will take place. I am out of ideas and frustrated.

Second Level Reset (Factory Reset)
This second level reset formats all non volatile memories and erases the Flexy 205 back to its
factory defaults. This operation consists in 3 steps:
• Erasement of all non volatile memories, including all COM parameters and IP addresses
• Full hardware auto test with result shown by the USR LED
• Return to factory configuration (default one)
How to generate a second level reset:

Power the unit OFF and ON again
Immediately press and maintain the reset button. The LED labeled BI1 turns ON
Wait approximately 35 seconds until the USR LED remains red steady
When this state is reached, release the button. The LED labeled BI1 turns OFF
It takes no longer than 5 seconds to complete.
Check if the auto test is successful, the USR LED blinks red following a pattern of 200ms
ON and 1,5 sec OFF. The Flexy 205 does not restart in normal mode by itself and remains
in this diagnose mode.
Power the Flexy 205 OFF and ON again to reboot the unit in normal mode. As described
before, the Flexy 205 returns to its default COM parameters and factory IP addresses
(default LAN IP: 10.0.0.53) after this second level reset is performed.
If a different pattern than the successful auto test one is displayed then this pattern reflects an
issue.
The pattern starts with 200ms ON (beginning of the pattern) followed by OFF and a certain
number of times 1 sec ON which allows to identify the nature of the detected problem. Please
write down the observed pattern and contact the local distributor referring to the pattern error.

kyle_HMS · February 9, 2023, 2:31pm

That’s really strange. While the Flexy is running, is it acknowledging the RESET button press (the BI1 button should illuminate green any time you press it)?

What I do is hold the button down before connecting the power, then keep it held down while the Ewon boots. The USR LED might come on orange for a bit then turn off, you can ignore this, then after about 30 - 40 seconds it should start blinking red for about 5 seconds, then turn solid red. This is when you can release the reset button. I will usually wait a few more seconds, then power cycle the Ewon to complete the reset process.

howitzer · February 9, 2023, 5:19pm

Yes it ack’s the RESET button press by illuminating the BI1 LED. I followed the idea to hold down the reset before turning on the power, and that seemed to do the trick. I was able to finally initiate Level 2 reset, and I just now finished re-entering my necessary settings through the webinterface. I’ve had some reboots already, it says due to main process Watchdog. I’m now getting DNS errors which I never had before. My DNS works, but I only have 1, not 2 of them, on this subnet.

I have attached new backup I just took.
MOVED TO STAFF NOTE (61.5 KB)

howitzer · February 9, 2023, 5:19pm

As for the initial issue of Tag values not resuming connecting after a reboot, I just did a reboot and after a few seconds from boot-up the tag values resume their connection. So that appears to be resolved now.

Regarding the DNS, this must be on the WAN side, as I have no DNS on my LAN side since its not intended to be on internet. I don’t know if my firewall subnet will let me set up a second DNS, but the second entry in eWon configuration is set to 0.0.0.0 which I understand should disable this second DNS. The first DNS is my gateway IP, it was set automatically from the DHCP.

kyle_HMS · February 9, 2023, 5:25pm

Hi @howitzer,

Please try using a public DNS for backup, like the Google one, 8.8.8.8. Unless your IT department locks down access to a specific DNS server, this should work. Otherwise, you may have to ask them what to use.

howitzer · February 9, 2023, 6:50pm

All right, I entered the 8.8.8.8 as the secondary DNS server. I will monitor the eWon overnight now and see what is in the logs. Thanks!

kyle_HMS · February 9, 2023, 10:11pm

Sounds good! Let me know what you find.

howitzer · February 10, 2023, 6:04pm

Nothing in the log overnight. But then today as I was adding a STATUS tag of one of the OPCUA topics to the Tag Values table, the eWon rebooted on me, stating in the log it was a Watchdog (main process) that caused the reboot. Immediately after the reboot, it had trouble accessing DNS server a few times but then it seems to have resolved itself. I will let it run over the weekend now and see if I have any more issues. The tag values did automatically resume reading after boot-up, so that is a good sign. I will revert back on Monday.

kyle_HMS · February 10, 2023, 6:10pm

It does sound like it’s operating properly now and reconnecting like it should after a disconnection. I wouldn’t worry too much about the single reboot as it occurred during configuration, but if it keeps happening, please send another backup with support files included. I’ll wait to hear back from you next week. Have a good weekend!

howitzer · February 15, 2023, 3:49pm

Hi Kyle

Still having opciosrv errors. Backup attached. Here is a selection of the errors in the Log. All OPCUA servers are on the same subnet. Each PLC has a Wifi adapter that connects to the main AP. The Wifi adapters that are used have Static IP’s, so it can’t be a DHCP lease problem.
MOVED TO STAFF NOTE (86.5 KB)

opcuaiosrv-Connect fail (BadTimeout)
opcuaiosrv-Connect fail (OpenSecureChannel)
opcuaiosrv-Waiting session disconnection
opcuaiosrv-Connect fail (BadCommunicationError)
opcuaiosrv-Read failed