This is a new bug/behaviour found since I first raised a related issue in the following thread:
[url]https://www.ghielectronics.com/community/forum/topic?id=22770&page=1[/url]
The device running against the 2016 R1 release SDK and latest firmware.
The issue we are facing is we have a number of G400-based systems deployed in the field. However, we record the number of uncontrolled system restarts that are occurring on the devices and it is significant. This is of concern because the device uses an SD card and corruption of the file system on this card will require a field trip and a tedious device removal/replacement process. We have included design elements, such as the use of super caps, to avoid an uncontrolled closure of the SD file system.
After some testing, it appears the issue is related to the Ethernet and/or Sockets, but I suspect Sockets due to the previous issue mentioned in the thread above and because of the environment the devices work in where the network connectivity at the device side or path to the server is variable. The issue can be replicated by using the test code of the above linked thread (or re-posted below) under the following conditions:
- The server must be running and have a working network connection
- The Ethernet cable on the G400-device side must be removed during operation. The issue manifests fairly immediately/readily by pulling the network cable out of the device after it has connected to the server and then replacing it after about 30 seconds.
With this setup I have validated the test code will always cause the device to crash/reset when it attempts to open a new Socket connection to the reachable server. If the server is physically not connected to the network or the server application is not running, the device does not crash, so it seems like the fact that if the Ethernet is disconnected when a Socket connection is open, something is going wrong in the Socket/network stack upon connection of the new Socket.
I have validated the code (SocketEx) still works around the following scenarios where the Ethernet connection on the device side remains permanently connected and:
- Server not connected to network
- Server connected to network but service not running
- Server intermittently connected to network and service running
I have also validated the following scenarios work where the Ethernet connection on the device side is disconnected periodically:
- Server not connected to network
- Server connected to network but service not running
The following is the MFDeploy debug trace:
Connecting to G400_Nexus…Connected
Starting comms…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
---------------------------->Network cable plugged in
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
---------------------------->Network cable disconnected
#### Exception System.Net.Sockets.SocketException - CLR_E_FAIL (13) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketEx::ConnectThread [IP: 0016] ####
#### SocketException ErrorCode = 10050
SocketEx: Socket connection attempt complete
#### Exception System.Net.Sockets.SocketException - CLR_E_FAIL (13) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### Testing.SocketEx::ConnectThread [IP: 0077] ####
#### SocketException ErrorCode = 10050
Failed to connect to [192.168.0.46]!
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
Network not available…
---------------------------->Network cable reconnected
Connecting to [192.168.0.46]…
SocketEx: Attempting to asynchronously open a socket connection…
SocketEx: Socket connection attempt complete
Connected to [192.168.0.46]!
LWIP Assertion “conn->state == NETCONN_CONNECT” failed at line 967 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “(conn->current_msg != NULL) || conn->in_non_blocking_connect” failed at line 968 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Closing connection to [192.168.0.46]
SocketEx: The socket is not scheduled to close and is not busy connecting so calling base socket close.
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
----------------------------> Device hangs and then reboots