G400D - BuiltInEthernet & Sockets causing hang/watchdog reset

Good day all,

I am using a G400D in a custom board with the built in Ethernet, using the 2016 prerelease SDK.

I have picked up an network issue that is causing the entire G400D module to hang & reset under certain conditions. Basically, if one opens/closes a socket connection to an available end-point, there is no problem. However, if the device tries to repeatedly open a connection to an end-point that is not available, and then suddenly is made available, the board will freeze upon connection and then presumably get reset by the watchdog timer (as I understand it, the watchdog is handled internally by the GHI libraries if not explicitly enabled).

Thus far I can duplicate the problem on both our custom board and a G400HDR using the J11D Ethernet connector thus ruling out hardware/power issues as per [url]https://www.ghielectronics.com/community/forum/topic?id=16834&page=1[/url].

The problem does not occur if I pull the Ethernet cable out of the JD11 on the G400 side as I get a SocketException immediately on connect with ErrorCode=10065 as one would expect. I have isolated the issue to the scenario where the server endpoint is not available for whatever reason which causes the Threaded connect in the SocketX class to abort the socket.Connect. I then call the socket.Close call after this timeout to clean up the socket. However, this Close indirectly causes the G400 to hang when the end-point becomes available again.

If I omit the socket.Close after a timeout-failed connect, everything will work fine for quite a while but as expected I eventually get a SocketException (10055=no buffer space) on Connect and then a SocketException (no error code) in the next SocketX constructor.base call because I have run out of sockets.

Can anyone duplicate this issue or am I doing something dumb?

using System;
using System.Threading;
using System.Net.Sockets;
using System.Net;

using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;

using GHI.Networking;

namespace Testing
{
    public class TestsSimple
    { 
        public static void Main()
        {
            bool useDHCP = true;

            ///Initialise Ethernet interface///////////////////////////////////////////////////////////////////////////

            bool initialised = false;

            //the following is a hack/workaround for the Builtin Ethernet to work in Debug mode
            //as per https://www.ghielectronics.com/community/forum/topic?id=22487

            long constructorTime = 0;
            int retryCount = 0;

            GHI.Networking.EthernetBuiltIn ethernet = null;

            while (constructorTime == 0 && retryCount < 5)
            {
                DateTime now = DateTime.Now;

                ethernet = new GHI.Networking.EthernetBuiltIn();

                constructorTime = ((DateTime.Now - now).Ticks / TimeSpan.TicksPerMillisecond);

                //Debug.Print("Ethernet constructor time (ms) = " + constructorTime);

                if (constructorTime == 0)
                {
                    ethernet.Dispose();
                    retryCount++;
                }
            }

            if (retryCount < 5)
            {
                initialised = true;
            }
            else
            {
                Debug.Print("ERROR: Working Ethernet interface object could not be created!");
            }

            Debug.Assert(initialised, "Failed to initialise Ethernet");

            //open ethernet 
            ethernet.Open();

            if (useDHCP)
            {
                ethernet.EnableDhcp();
                ethernet.EnableDynamicDns();

                while (ethernet.IPAddress == "0.0.0.0")
                {
                    Thread.Sleep(500);
                }
            }
            else
            {
                ethernet.EnableStaticIP("192.168.0.161", "255.255.255.0", "192.168.0.1");
                ethernet.EnableStaticDns(new string[] { "192.168.0.200" });
            }
        
            ///Open/close socket to remote end-point./////////////////////////////////////////////////////////////////////////////////////
            ///If the remote end point is not available this will continue to loop. If the loop is allowed to loop at least twice with the
            ///end point not available and then is made available the connection apparently succeeds but then the G400 hangs/resets

            IPEndPoint endpoint = new IPEndPoint(new IPAddress(new byte[] { 192, 168, 0, 46 }), 21000);

            while (true)
            {
                SocketX socket = new SocketX(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);

                Debug.Print("Opening connection to [" + endpoint.Address.ToString() + "]");

                if (socket.Connect(endpoint, 3000))
                {
                    Debug.Print("Now connected to [" + endpoint.Address.ToString() + "]");

                    Thread.Sleep(3000);

                    Debug.Print("Closing connection to [" + endpoint.Address.ToString() + "]");

                    socket.Close();
                }
                else
                {
                    Debug.Print("Failed to connect to [" + endpoint.Address.ToString() + "]");

                    //if this close is commented out, the problem "disappears" until we run out of sockets!
                    socket.Close();
                    Thread.Sleep(1000);
                }               
            }
        }
    }

    /// <summary>
    /// Provides an extended wrapper of the .Net Micro Socket implementation
    /// </summary>
    public class SocketX : Socket
    {
        private bool _connectionSuccessful = false;
        private Thread _connectThread;
        private EndPoint _endPoint;
        private int _lastSocketErrorCode;

        public int LastSocketErrorCode
        {
            get { return _lastSocketErrorCode; }
        }

        public SocketX(AddressFamily addressFamily, SocketType socketType, ProtocolType protocolType)
            : base(addressFamily, socketType, protocolType)
        {
            SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.Linger, -2);
            SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, true);
            //SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.KeepAlive, true);
        }

        protected override void Dispose(bool disposing)
        {
            if ((_connectThread != null) && (_connectThread.IsAlive))
            {
                _connectThread.Abort();
                _connectThread = null;
            }

            base.Dispose(disposing);
        }

        /// <summary>
        /// Connects to the specified end point using the specified timeout.
        /// </summary>
        /// <param name="endPoint">The end point.</param>
        /// <param name="timeoutMilliseconds">The timeout milliseconds.</param>
        /// <returns></returns>
        public bool Connect(EndPoint endPoint, int timeoutMilliseconds)
        {
            _endPoint = endPoint;

            _connectThread = null;
            _connectThread = new Thread(ConnectThread);
            _connectThread.Start();

            DateTime startTime = DateTime.Now;
            TimeSpan timeout = new TimeSpan(0, 0, 0, 0, timeoutMilliseconds);

            while ((DateTime.Now - startTime < timeout) && (!_connectionSuccessful))
            {
                Thread.Sleep(100);
            }

            if (_connectThread.IsAlive)
            {
                _connectThread.Abort();
                _connectThread = null;
            }

            return _connectionSuccessful;
        }

        private void ConnectThread()
        {
            try
            {
                _connectionSuccessful = false;

                Connect(_endPoint);

                //the following is a fix for GHI SDK 2016 R1 PR1
                //where the connection succeeds when the remote endpoint is not actually available
                //and the contained remote end point data is garbage.
                if (RemoteEndPoint.ToString() == _endPoint.ToString())
                {   
                    _connectionSuccessful = true;
                }
            }
            catch (SocketException f)
            {
                _lastSocketErrorCode = f.ErrorCode;               
            }
        }
    }
}

Almost forgot - the following might be useful - the following is what I hacked together for testing to act as the server side (Windows app):

using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Net;
using System.Net.Sockets;

namespace ServerTest
{
    static class Testing
    {
        private static Socket _serverSocket;
        private static ConcurrentQueue<ClientSocket> _clientSockets = new ConcurrentQueue<ClientSocket>();
        private static Thread _managementThread;
        //private const bool DEBUG = false;
        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        static void Main(string[] args)
        {
            _managementThread = new Thread(SocketManagement);
            _managementThread.Start();

            _serverSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            _serverSocket.ReceiveTimeout = 1000;
            _serverSocket.SendTimeout = 1000;
            _serverSocket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, true);
            _serverSocket.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.DontLinger, true);

            _serverSocket.Bind(new IPEndPoint(new IPAddress(new byte[] { 192, 168, 0, 46 }), 21000));
            _serverSocket.Listen(5);

            _serverSocket.BeginAccept(new AsyncCallback(OnAccept), null);

            Thread.Sleep(-1);
        }

        private static void OnAccept(IAsyncResult ar)
        {
            try
            {
                Socket socket = _serverSocket.EndAccept(ar);
                

                ClientSocket clientSocket = new ClientSocket(socket);
                
                _clientSockets.Enqueue(clientSocket);               

                //Start listening for more clients
                _serverSocket.BeginAccept(new AsyncCallback(OnAccept), null);
            }
            catch
            {
            }
        }

        private static void SocketManagement()
        {
            ClientSocket clientSocket;

            while (true)
            {
                if (_clientSockets.TryPeek(out clientSocket))
                {
                    if (DateTime.Now - clientSocket.ConnectTime > TimeSpan.FromSeconds(60))
                    {
                        if (_clientSockets.TryDequeue(out clientSocket))
                        {
                            //clientSocket.Socket.Disconnect(false);
                            clientSocket.Socket.Close();
                            clientSocket.Socket.Dispose();
                        }
                    }
                }

                Thread.Sleep(500);
            }
        }
    }

    internal class ClientSocket
    {
        public Socket Socket { get; set; }
        public DateTime ConnectTime {get; set;}
        
        public ClientSocket(Socket clientSocket)
        {
            Socket = clientSocket;
            ConnectTime = DateTime.Now;
        }
    }
}

I just tried the same test with the ENC28 on the G400HDR and the same hang/reset occurs so it looks like this is not specific to the built in Ethernet interface.

Out of interest, I connected to the running device using MFDeploy to view the debug output and disconnected the cable on the server side and reconnected the cable again. It looks like the low level network stack is not having a good time:

Connecting to G400_Nexus…Connected
Closing connection to [192.168.0.46]
g connection to [192.168.0.46]
Opening connection to [192.168.0.46]
Now connected to [192.168.0.46]
Closing connection to [192.168.0.46]
Opening connection to [192.168.0.46]
Now connected to [192.168.0.46]
Closing connection to [192.168.0.46]
Opening connection to [192.168.0.46]
Now connected to [192.168.0.46]
Closing connection to [192.168.0.46]
Opening connection to [192.168.0.46]
Now connected to [192.168.0.46]
Closing connection to [192.168.0.46]
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (224) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (225) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (226) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (227) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (228) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (229) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (230) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Net.Sockets.SocketException - CLR_E_FAIL (231) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::connect [IP: 0000] ####
#### System.Net.Sockets.Socket::Connect [IP: 001d] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
#### SocketException ErrorCode = 10056
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
#### Exception System.Net.Sockets.SocketException - CLR_E_FAIL (232) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::connect [IP: 0000] ####
#### System.Net.Sockets.Socket::Connect [IP: 001d] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
#### SocketException ErrorCode = 10056
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (235) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (236) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (238) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (240) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (241) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
Failed to connect to [192.168.0.46]
Opening connection to [192.168.0.46]
#### Exception System.Net.Sockets.SocketException - CLR_E_FAIL (243) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::connect [IP: 0000] ####
#### System.Net.Sockets.Socket::Connect [IP: 001d] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
#### SocketException ErrorCode = 10056
User Cancel…
Connecting to G400_Nexus…Failed to connect to [192.168.0Connected
.46]

Exception System.Threading.ThreadAbortException - 0x00000000 (263)

LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\ #### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (264) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.46]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (265) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### IDX.Nexus.Communications.Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.46]
LWIP Assertion “conn->state == NETCONN_CONNECT” failed at line 967 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “(conn->current_msg != NULL) || conn->in_non_blocking_connect” failed at line 968 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “conn->state == NETCONN_CONNECT” failed at line 967 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “(conn->current_msg != NULL) || conn->in_non_blocking_connect” failed at line 968 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “unhandled level” failed at line 2452 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lOpening connection to [192.168.0.46]

Device hangs/reboots here! <<<<<

I also tried downgrading to SDK 2015 R1 with the same result. I have had no luck finding mention of similar issues out there - can it really be nobody else has encountered this rather serious bug?

@ AlexTZA -

Hi, we tried exactly your code at post #1, just modify to our IP address. We can not reproduce the issue.
We also tried to print a string after calling Close() and you can see that it is still running after the cable was disconnected few times.

Is the bug always happened or is it happened randomly?

Also, some information you got at post #2, they can be ignored.

Hi Dat,

Thanks for the feedback, but I am a bit stumped that you were not able to reproduce the error. I’m testing the code on a G400HDR, so no custom hardware and the code sample in this case is shared. Using either SDK 2015 R1 or 2016 PR1 has same result using the test code I provided, as well as on the built-in or ENC28 interfaces. What else could be different on your end - the network cable/switch arrangement? My G400 connects into a switch - I’m not connecting it directly to my server otherwise disconnecting the cable would not cause the error. I’m running the code in Debug mode.

Just to be clear, pulling the network cable out on the G400 side does NOT cause the error - it only occurs when the cable on the server side is removed or the server itself is stopped/paused. Typically, I disconnect the server, let the G400 try connect for a few minutes then plug the cable back in on the server side and this usually causes the device to hang and then reboot right after printing “Opening connection to…” so the issue is occurring in the Connect command. On the odd occasion the error does not occur the first time I do the “reconnect” but then I go through the above disconnect/connect cycle a few times and this usually results in the crash.

After some more testing I have noticed a subtle but important detail.

The problem does not occur when the server application is paused/stopped as I had previously stated - my bad.

I asked a colleague to try and replicate the problem using the code and his was just working also. However, what I noticed when connecting MFDeploy to the “working” device is that in his case the Connect was not being aborted in the Thread as no ThreadAbortException and LWIP errors were being reporting via debug. The difference was he was stopping the server application, not disconnecting the cable on the server side. So it appears when the server IP is available but there is no socket listening at the target server port, socket.Connect returns and the connection fails immediately. If the network cable/wireless is disabled instead, the SocketExceptions occur on the .NET MF side and only then when the network is reconnected on the server side does the .NET MF side crash. So the issue is definitely related to aborting the Thread the socket is connecting in and then Closing the socket in the Connect state.

I also found this thread [url]https://www.ghielectronics.com/community/forum/topic?id=12496&page=1[/url] which appears to show remarkably similar behaviour on EMX/G120 devices 2 years ago. The thread which although marked as Solved, does not actually offer a to the crashing part of the problem. Unfortunately, a watchdog reset for my application when the socket connect hangs is not a viable solution.

Below is a trace of the MFDeploy output including the timing of the different actions.

<< Device starts and tries to connect to server which has network connected but no socket listener is running. Notice no ThreadAbortExceptions >>>>
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]

<<<< Server app started on server endpoint >>>>>

Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]
Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]
Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]
Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]
Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]
Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]
Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]
Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]
Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
Closing connection to [192.168.0.13]

<<<< Network cable disconnected on server side >>>>>

Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (79) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (80) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (81) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (82) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (83) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (84) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (85) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (86) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
Failed to connect to [192.168.0.13]
Opening connection to [192.168.0.13]
#### Exception System.Threading.ThreadAbortException - 0x00000000 (89) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Connect [IP: 0029] ####
#### Testing.SocketX::ConnectThread [IP: 000f] ####
Failed to connect to [192.168.0.13]

<<<< Network cable reconnected on server side >>>>>>

Opening connection to [192.168.0.13]
Now connected to [192.168.0.13]
LWIP Assertion “conn->state == NETCONN_CONNECT” failed at line 967 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “(conn->current_msg != NULL) || conn->in_non_blocking_connect” failed at line 968 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “conn->state == NETCONN_CONNECT” failed at line 967 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “(conn->current_msg != NULL) || conn->in_non_blocking_connect” failed at line 968 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “conn->state == NETCONN_CONNECT” failed at line 967 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “(conn->current_msg != NULL) || conn->in_non_blocking_connect” failed at line 968 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “conn->state == NETCONN_CONNECT” failed at line 967 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “(conn->current_msg != NULL) || conn->in_non_blocking_connect” failed at line 968 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
Closing connection to [192.168.0.13]
LWIP Assertion “PCB must be deallocated outside this function” failed at line 650 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
LWIP Assertion “recvmbox must be deallocated before calling this function” failed at line 651 in D:\Repos\NETMF_Firmware\DeviceCode\pal\lwip\lwip\src\api\api_msg.c
RAMC CRC:bffcda31!=8d97213b
RAMC 00:eaffff44
RAMC 04:e59ff010
RAMC 08:baadf00d
RAMC 0c:e59ff00c
RAMC 10:e59ff00c
RAMC 14:baadf00d
RAMC 18:e59ff008
RAMC 1c:20

<<<< device hangs/reboots >>>>

To add to the confusion, the above trace was done with 2015 R1. With 2016 R1 prerelease the only difference is a SocketException (10065) is thrown in Connect rather than just returning when the server is connected to the network, but not running.

So after a few experiments it seems there is a workaround as long as you play nice with the Sockets and don’t ever Close one while a Connect has been called.

An important surprise finding for me is that the standard Socket.Connect now times-out after 45 seconds or so if the remote server network connection is not connected, whereas in versions prior to .NET MF 4.3 this was not the case, hence the need for a threaded socket connect.

In light of this, I have modified my code to take this new Connect behaviour into account. As I have a requirement to be able to stop and start the comms subsystem at any time without having to wait for a Socket to decide to timeout, it was necessary to implement some lazy Socket close logic. In this case, if a socket is closed in code which is actually still trying to Connect, the close is delayed until the Connect thread is naturally wrapping up. This seems to keep the Socket gremlins happy as far as I can tell. For the greater good of .NET MF-kind, below is the updated SocketX class that contains the principal methodology from my test project.


/// <summary>
/// Provides an extended wrapper of the .Net Micro Socket implementation 
/// </summary>
public class SocketX : Socket
{
	public delegate void ConnectComplete(SocketX sender, ConnectionEventArgs args);
	public event ConnectComplete EndConnect = delegate { };
	
	private bool _connected = false;
	private bool _isClosed = false;
	private object _closeLock = new object();
	private Thread _connectThread;
	private IPEndPoint _endPoint
        private AutoResetEvent _connectionComplete;
	
	public IPEndPoint EndPoint
	{
	    get { return _endPoint; }
	}
	
	public bool Connected
	{
	    get { return _connected; }
	}
	
	public bool ConnectionPending
	{
	    get { return (!Connected && _connectThread != null && _connectThread.IsAlive); }
	}
	
	public SocketX(AddressFamily addressFamily, SocketType socketType, ProtocolType protocolType)
	    : base(addressFamily, socketType, protocolType)
	{
	    //SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.Linger, -2);
	    SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.DontLinger, true);
	    SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, true);
	    //SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.KeepAlive, true);    
            
           _connectionComplete = new AutoResetEvent(false);         
	}
	
	public new void Close()
	{
	    _connected = false;

            //release any pending blocking connect operations
           _connectionComplete.Set();

	    lock (_closeLock)
	    {
	        if (!_isClosed && (_connectThread == null || !_connectThread.IsAlive))
	        {
	            //if the socket is not already closed and the connect thread was never started or the connect thread has completed, do a Close
	            base.Close();
	        }
	
	        _isClosed = true;
	    }
	}
	
        /// <summary>
        /// Starts an asynchronous connect operation. EndConnect is raised on completion
        /// </summary>
        /// <param name="endPoint">The end point.</param>
        /// <returns></returns>
	public bool BeginConnect(IPEndPoint endPoint)
	{
	    if (!ConnectionPending)
	    {
	        //if there is no connection already pending, start the connection thread
	        _endPoint = endPoint;
	
	        _connectThread = null;
	        _connectThread = new Thread(ConnectThread);
	        _connectThread.Start();
	
	        return true;
	    }
	    else
	    {
	        return false;
	    }
	}
	
	/// <summary>
        /// Does a blocking connect that supports other threads calling Close during the Connect. This call will timeout:
        /// - Almost immediately if the remote IP is available but the endpoint server socket is not listening
        /// - In approx 45 seconds if the remote IP is not available
        /// </summary>
        /// <param name="endPoint">The end point.</param>
        /// <returns></returns>
        public bool Connect(IPEndPoint endPoint)
        {
            if (!ConnectionPending)
            {
                BeginConnect(endPoint);

                _connectionComplete.WaitOne();
            }

            return _connected;
        } 
	
	private void ConnectThread()
	{
	    int errorCode = 0;
	
	    try
	    {
	        _connected = false;
	
	        //in .NETMF 4.3:
	        //- the connect will fail immediately if the remote IP is available but the endpoint server is not running
	        //- the connect will timeout after 45 sec or so if the endpoint IP is not available, and will succeed if the endpoint is suddenly made available                
	        base.Connect(_endPoint);
	
	        //the following is a check-fix for GHI SDK 2016 R1 PR1
	        //where the connection succeeds when the remote endpoint is not actually available
	        //and the contained remote end point data is garbage.
	        if (RemoteEndPoint.ToString() == _endPoint.ToString())
	        {
	            _connected = true;
	        }
	    }
	    catch (SocketException f)
	    {
	        //usually occurs if the local network cable is disconnected or the underlying ethernet connection is not initialised
	        errorCode = f.ErrorCode;
	    }
	
	    //call aynch end event
	    EndConnect(this, new ConnectionEventArgs(_connected, errorCode));
	
	    //do a lazy close if necessary as this thread may still be running after a Close was called on this object
	    lock (_closeLock)
	    {
	        //was Close previously called on this socket?
	        if (_isClosed)
	        {
	            //the socket was closed but the thread was still active when that happened so call Close now to clean this socket up properly
	            base.Close();
	        }
	    }
            
           _connectionComplete.Set();
	}
}
	
	
public class ConnectionEventArgs : EventArgs
{
	/// <summary>
	/// Gets the volume information.
	/// </summary>
	/// <value>
	/// The volume information.
	/// </value>
	/// 
	public bool Connected { get; internal set; }
	public int SocketErrorCode { get; internal set; }
	
	/// <summary>
	/// Initializes a new instance of the <see cref="VolumeEventArgs"/> class.
	/// </summary>
	/// <param name="volumeInfo">The volume information.</param>
	public ConnectionEventArgs(bool connected, int socketErrorCode)
	    : base()
	{
	    Connected = connected;
	    SocketErrorCode = socketErrorCode;
	}
}

@ AlexTZA -
We found the issue in native lwip

If your solution is working well then just use it at this time.

Thanks Dat,

Glad to see it wasn’t just me losing my mind :slight_smile: