G120 and Sockets, reliability problem - Need help!

We have a new product based on the G120 using the CobraII schematics.
I have an ENC28J60 module.
My current problem is this: Both in server and client mode, sockets are not reliable.
It works for a while, and eventually the module does not respond at all.

If in client mode, eventually it cannot connect anymore ( I have a thread that aborts an other client thread so it tries again after a timeout)

If in server mode, eventually the server just blocks at socket.Accept(), and does not accept new connections (Also, the browser does not receive the complete response, it is cut).

I have made a Gadgeteer project using the FezCobraII board and an ENC28J60 module.
Everything works … for a while. By a while I mean, it can take minutes to hours, and the bigger the G120 sends packets, the higher the probability it will bug out (message of over 4k bytes)

Here is my current test code. I have made a minimalist example that can easily reproduce the problem.

I you want to make it hang pretty fast, put 100 lines or more in the ServerThread() method.

Now in a browser, type the device IP adress and hit Ctrl-F5 every seconds.
Eventually, it bugs out for me.
If you want to bug it pretty fast, hold the Ctrl-F5 button to spam the G120. It makes exceptions (understandly), but eventually nothing works anymore.

A) Can anyone reproduce this problem to confirm it is not just our board.
B) Any workaround so this becomes reliable, not just a few connections, but with thousands of connections over several days?


using System;
using System.Collections;
using System.Threading;
using Microsoft.SPOT;
using Microsoft.SPOT.Presentation;
using Microsoft.SPOT.Presentation.Controls;
using Microsoft.SPOT.Presentation.Media;
using Microsoft.SPOT.Touch;

using Gadgeteer.Networking;
using GT = Gadgeteer;
using GTM = Gadgeteer.Modules;
using GHINET = GHI.Premium.Net;
using Microsoft.SPOT.Hardware;
using GHI.Premium.Hardware;
using GHI.Premium.Net;
using System.Net;
using System.Net.Sockets;
using System.Text;
using Gadgeteer.Modules.GHIElectronics;

namespace TestNetwork {
    public partial class Program {
        // This method is run when the mainboard is powered up or reset.   
        private Thread mainLoopThread;
        private Thread serverThread;
        private EthernetENC28J60 NetInterface;


        void ProgramStarted() {
            /*******************************************************************************************
            Modules added in the Program.gadgeteer designer view are used by typing 
            their name followed by a period, e.g.  button.  or  camera.
            
            Many modules generate useful events. Type +=<tab><tab> to add a handler to an event, e.g.:
                button.ButtonPressed +=<tab><tab>
            
            If you want to do something periodically, use a GT.Timer and handle its Tick event, e.g.:
                GT.Timer timer = new GT.Timer(1000); // every second (1000ms)
                timer.Tick +=<tab><tab>
                timer.Start();
            *******************************************************************************************/


            // Use Debug.Print to show messages in Visual Studio's "Output" window during debugging.
            Debug.Print("Program Started");

            InitNetwork();

            mainLoopThread = new Thread(MainLoop);
            mainLoopThread.Priority = ThreadPriority.Normal;
            mainLoopThread.Start();
        }


        private void InitNetwork() {

            try {
                //ethernet_ENC28 = new GHI.Premium.Net.EthernetENC28J60(SPI.SPI_module.SPI2, GHI.Hardware.G120.Pin.P1_17, GHI.Hardware.G120.Pin.P2_21, GHI.Hardware.G120.Pin.P1_14, 4000);
                NetInterface = ethernet_ENC28.Interface;

                NetInterface.CableConnectivityChanged += new GHI.Premium.Net.EthernetENC28J60.CableConnectivityChangedEventHandler(Interface_CableConnectivityChanged);
                NetInterface.NetworkAddressChanged += new GHI.Premium.Net.NetworkInterfaceExtension.NetworkAddressChangedEventHandler(Interface_NetworkAddressChanged);

                if (!NetInterface.IsOpen)
                    NetInterface.Open();

                GHI.Premium.Net.NetworkInterfaceExtension.AssignNetworkingStackTo(NetInterface);

                byte[] macAddress = new byte[] {0x00, 0x1c, 0x14, 0xf9, 0xda, 0xe9};
                for (int i=0;i<6;i++) {
                    if ( NetInterface.NetworkInterface.PhysicalAddress[i] != macAddress[i] ) {
                        NetInterface.NetworkInterface.PhysicalAddress = macAddress;  
                        Debug.Print("Updating MAC Address, please reboot for changes to take effect.");
                    }
                }

                if (!NetInterface.NetworkInterface.IsDhcpEnabled)
                    NetInterface.NetworkInterface.EnableDhcp();
                NetInterface.NetworkInterface.RenewDhcpLease();

            } catch (Exception ex) {
                Debug.Print("Error while initializing network" + ex.Message);
            }
        }


        void Interface_NetworkAddressChanged(object sender, EventArgs e) {
            Debug.Print("AddressChanged:" + NetInterface.NetworkInterface.IPAddress);

            if (NetInterface.NetworkInterface.IPAddress != "0.0.0.0") {
                StartServer();
            }
        }

        void Interface_CableConnectivityChanged(object sender, EthernetENC28J60.CableConnectivityEventArgs e) {
            Debug.Print("NetworkConnectChanged:" + e.IsConnected);

        }


        void StartServer() {

            serverThread = new Thread(ServerThread);
            serverThread.Priority = ThreadPriority.Normal;
            serverThread.Start();

        }

        byte[] request = new byte[65536];

        void ServerThread() {
            
            // Bind the listening socket to the port
            IPAddress hostIP = IPAddress.Parse(NetInterface.NetworkInterface.IPAddress);
            IPEndPoint ep = new IPEndPoint(hostIP, 80);
            Socket listenSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            listenSocket.Bind(ep);

            // Start listening
            listenSocket.Listen(1);

            string msgStr = "Hello, browser! I think the time is " + DateTime.Now.ToString();
            string responseStr = "";
            for (int i = 0; i < 75; i++) {
                responseStr = responseStr + "\r" + msgStr;
            }


            byte[] messageBytes = Encoding.UTF8.GetBytes(responseStr);
            

            // Main thread loop
            while (true) {
                Socket socket = null;
                try {
                    Debug.Print("listening...");
                    socket = listenSocket.Accept();
                    Debug.Print("Accepted a connection from " + socket.RemoteEndPoint.ToString());
                } catch (Exception e) {
                    Debug.Print("Exception @ listenSocket.Accept():" + e.Message);
                    continue;
                }
                
                if (!ReadAll(socket)) {
                    continue;
                }

                try {
                    socket.Send(messageBytes);

                } catch (Exception e) {
                    Debug.Print("Exception @ listenSocket.Accept():" + e.Message);
                    if (ReadAll(socket)) {
                    }
                    CloseSocket(socket);
                    continue;
                }
                CloseSocket(socket);
                socket = null;
            }



        }

        bool ReadAll(Socket socket) {
            try {
                while (socket.Poll(15000, SelectMode.SelectRead)) {
                    try {
                        int numBytes = socket.Receive(request);
                        if (numBytes == 0) {
                            break;
                        }
                        Debug.Print("Receiving bytes(" + numBytes + "):" + Encoding.UTF8.GetChars(request).ToString());
                    } catch (Exception e) {
                        Debug.Print("Exception @ newSock.Receive():" + e.Message);
                        CloseSocket(socket);
                        return false;
                    }
                }
            } catch (Exception e) {
                Debug.Print("Exception @ newSock.Poll():" + e.Message);
                CloseSocket(socket);
                return false;
            }
            return true;
        }

        void CloseSocket(Socket socket) {

            try {
                socket.Close();
            } catch (Exception e) {
                Debug.Print("Exception @ newSock.Close():" + e.Message);
            }
        }

        void MainLoop() {
            Thread.Sleep(-1);

        }
    }
}


Another thing: As a temporary fix, I tried a different approach:
If after 30 seconds there’s no connection to the server, I close the thread that accepts connection, I then close the ENCJ60 interface (theres an weird exception there, I don’t think it has been really tested).
After that, I reset the network module by writing on the pin directly with an OutputPort, and Then I reinitialize the ENCJ60 module.
Finally I restart the socket server thread again.
It worked for hours: when there were no more connection after X seconds, the workaround above executed, and connections were possible again, until the next time it happens.
But this morning all 3 of the boards I tested couldn’t be debugged anymore and no response on the webserver,
I had to reflash them to deploy the code again.
Pretty weird stuff.

We will look into this.

So this may be similar to what I have been seeing.

I ended up writing some code to force a reboot if the socket hangs (sending to cosm/xively and another wamp server). I have not had it go a full day without hanging. And about once every day or so the firmware is corrupted and I have to reflash it.

I am new to software dev so that may play a role in it.

I did not use my network reset procedure in the code posted.
And sure I can put the other sockets global, although I think (in the code posted at least) that it does not change anything?
Anyway same problem occured, I hit refresh for about 30 times, and it stopped working.

using System;
using System.Collections;
using System.Threading;
using Microsoft.SPOT;
using Microsoft.SPOT.Presentation;
using Microsoft.SPOT.Presentation.Controls;
using Microsoft.SPOT.Presentation.Media;
using Microsoft.SPOT.Touch;

using Gadgeteer.Networking;
using GT = Gadgeteer;
using GTM = Gadgeteer.Modules;
using GHINET = GHI.Premium.Net;
using Microsoft.SPOT.Hardware;
using GHI.Premium.Hardware;
using GHI.Premium.Net;
using System.Net;
using System.Net.Sockets;
using System.Text;
using Gadgeteer.Modules.GHIElectronics;

namespace TestNetwork {
    public partial class Program {
        // This method is run when the mainboard is powered up or reset.   
        private Thread mainLoopThread;
        private Thread serverThread;
        private EthernetENC28J60 NetInterface;
        private Socket listenSocket;
        private Socket serverSocket = null;

        void ProgramStarted() {
            /*******************************************************************************************
            Modules added in the Program.gadgeteer designer view are used by typing 
            their name followed by a period, e.g.  button.  or  camera.
            
            Many modules generate useful events. Type +=<tab><tab> to add a handler to an event, e.g.:
                button.ButtonPressed +=<tab><tab>
            
            If you want to do something periodically, use a GT.Timer and handle its Tick event, e.g.:
                GT.Timer timer = new GT.Timer(1000); // every second (1000ms)
                timer.Tick +=<tab><tab>
                timer.Start();
            *******************************************************************************************/


            // Use Debug.Print to show messages in Visual Studio's "Output" window during debugging.
            Debug.Print("Program Started");

            InitNetwork();

            mainLoopThread = new Thread(MainLoop);
            mainLoopThread.Priority = ThreadPriority.Normal;
            mainLoopThread.Start();
        }


        private void InitNetwork() {

            try {
                //ethernet_ENC28 = new GHI.Premium.Net.EthernetENC28J60(SPI.SPI_module.SPI2, GHI.Hardware.G120.Pin.P1_17, GHI.Hardware.G120.Pin.P2_21, GHI.Hardware.G120.Pin.P1_14, 4000);
                NetInterface = ethernet_ENC28.Interface;

                NetInterface.CableConnectivityChanged += new GHI.Premium.Net.EthernetENC28J60.CableConnectivityChangedEventHandler(Interface_CableConnectivityChanged);
                NetInterface.NetworkAddressChanged += new GHI.Premium.Net.NetworkInterfaceExtension.NetworkAddressChangedEventHandler(Interface_NetworkAddressChanged);

                if (!NetInterface.IsOpen)
                    NetInterface.Open();

                GHI.Premium.Net.NetworkInterfaceExtension.AssignNetworkingStackTo(NetInterface);

                byte[] macAddress = new byte[] {0x00, 0x1c, 0x14, 0xf9, 0xda, 0xe9};
                for (int i=0;i<6;i++) {
                    if ( NetInterface.NetworkInterface.PhysicalAddress[i] != macAddress[i] ) {
                        NetInterface.NetworkInterface.PhysicalAddress = macAddress;  
                        Debug.Print("Updating MAC Address, please reboot for changes to take effect.");
                    }
                }

                if (!NetInterface.NetworkInterface.IsDhcpEnabled)
                    NetInterface.NetworkInterface.EnableDhcp();
                NetInterface.NetworkInterface.RenewDhcpLease();

            } catch (Exception ex) {
                Debug.Print("Error while initializing network" + ex.Message);
            }
        }


        void Interface_NetworkAddressChanged(object sender, EventArgs e) {
            Debug.Print("AddressChanged:" + NetInterface.NetworkInterface.IPAddress);

            if (NetInterface.NetworkInterface.IPAddress != "0.0.0.0") {
                StartServer();
            }
        }

        void Interface_CableConnectivityChanged(object sender, EthernetENC28J60.CableConnectivityEventArgs e) {
            Debug.Print("NetworkConnectChanged:" + e.IsConnected);

        }


        void StartServer() {

            serverThread = new Thread(ServerThread);
            serverThread.Priority = ThreadPriority.Normal;
            serverThread.Start();

        }

        byte[] request = new byte[65536];

        void ServerThread() {
            
            // Bind the listening socket to the port
            IPAddress hostIP = IPAddress.Parse(NetInterface.NetworkInterface.IPAddress);
            IPEndPoint ep = new IPEndPoint(hostIP, 80);
            listenSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            listenSocket.Bind(ep);

            // Start listening
            listenSocket.Listen(1);

            string msgStr = "Hello, browser! I think the time is " + DateTime.Now.ToString();
            string responseStr = "";
            for (int i = 0; i < 75; i++) {
                responseStr = responseStr + "\r" + msgStr;
            }


            byte[] messageBytes = Encoding.UTF8.GetBytes(responseStr);
            

            // Main thread loop
            while (true) {
                try {
                    Debug.Print("listening...");
                    serverSocket = listenSocket.Accept();
                    Debug.Print("Accepted a connection from " + serverSocket.RemoteEndPoint.ToString());
                } catch (Exception e) {
                    Debug.Print("Exception @ listenSocket.Accept():" + e.Message);
                    continue;
                }
                
                if (!ReadAll()) {
                    continue;
                }

                try {
                    serverSocket.Send(messageBytes);

                } catch (Exception e) {
                    Debug.Print("Exception @ listenSocket.Send():" + e.Message);
                    if (ReadAll()) {
                    }
                    CloseSocket();
                    continue;
                }
                CloseSocket();
                serverSocket = null;
            }



        }

        bool ReadAll() {
            try {
                while (serverSocket.Poll(15000, SelectMode.SelectRead)) {
                    try {
                        int numBytes = serverSocket.Receive(request);
                        if (numBytes == 0) {
                            break;
                        }
                        Debug.Print("Receiving bytes(" + numBytes + "):" + Encoding.UTF8.GetChars(request).ToString());
                    } catch (Exception e) {
                        Debug.Print("Exception @ newSock.Receive():" + e.Message);
                        CloseSocket();
                        return false;
                    }
                }
            } catch (Exception e) {
                Debug.Print("Exception @ newSock.Poll():" + e.Message);
                CloseSocket();
                return false;
            }
            return true;
        }

        void CloseSocket() {
            try {
                serverSocket.Close();
            } catch (Exception e) {
                Debug.Print("Exception @ newSock.Close():" + e.Message);
            }
        }

        void MainLoop() {
            Thread.Sleep(-1);

        }
    }
}

The ReadAll() method is just to read everything the client sends, even if I don’t do anything with it in the test.
It appeared to hang less often with it (maybe the buffer gets full if I never read?)

More testing…
I’m doing more tests with the wifi module (WiFiRS9110)
And so far, it never hangs, even I spam the server.
This is just after a few minutes of testing … will post again if it stays reliable!

Just to report Wifi works perfect so far.
Even if I unpluig the antenna, the link goes down, plug the antenna, the link goes up, then it rejoins the network, and socket.accept() never hangs.
Since yesterday, it exchange packets with a socket server every 15 seconds.

The other possibility is that our custom hardware is faulty for the ENC28J60.
I will test my code with a G120HDR and ENC28J60 gadgeteer module.

Ok today I’ve tested a G120HDR with ENC28J60 gadgeteer module, same problem!
So it is not our hardware, I’m pretty sure there’s a bug in the library somewhere, something like a buffer that does not get cleared, a multithread problem, etc…

Also, the wifi version worked 100% all weekend!
There was transactions every 15 seconds, and it never lost the connection for all weekend (total of around 60 hours).

The LAN version on 3 different PCB, all lost the network under an hour.

Thanks for asking to clarify, I really want to nail this bug or find a workaround !

But yes, take the code I have posted earlier, but instead use a Redpine wifi module (like the WiFi RS21 Module).
The socket.Accept() will never hang if there is a wifi link up and a client tries to connect.
It can block there if the link is down or if there are no clients trying to connect. But if the link is up, and a client tries to connect, it will work.

Unlike the LAN version, (ENC28J60) that eventually (after minutes or hours), the socket.Accept() part will block, and never receive any new connection.

Did you guys try the sample code I’ve posted ? I think it is simple enough, and you can replicate the bug pretty clearly…
Just hit refresh like 50 times or so and no new connections are possible, unless you reboot or redeploy the code.

Try the same with the redpine wifi module, and it seems bulletproof.

Also is it normal that you can’t define a new mac address on the ENC28J60 module without rebooting ?
I mean, you can set it, but the MAC does not update on the chip (verified with our DHCP server).
Then if I reboot, the MAC that I previously set will be remembered and works (it will get a new IP address because the MAC address changed).
Maybe I should make a different post about this particular problem.

I agree, waiting for someone at GHI to confirm that there’s bug.

Yes that’s what happening.
I’m not sure it is the normal behavior, but if you test it with a DHCP server, it’s pretty clear that it does not change until reboot (the interface will accepts it, but the actual MAC in the chip won’t change until reboot (or maybe until the module is initialized)

In any way, the mac address is not a big problem, the problem is the disconnects and the hangs at socket.Accept()

This is on the list but this is not a simple answer. We are trying our best to get you more details.

Thanks! I understand that you guys must have hundreds of different issues to address, but this one is pretty serious because it is a reliability problem, with something as relatively basic as a wired network connection.

Just checking if there is any news on the problem ?

ExecutionConstraint.Install(timeout, 0)
from https://www.ghielectronics.com/community/forum/topic?id=12396

Seemed to work for me. I have a device that would hang daily. Now it has been running for several days straight since I put that in.

@ mjrogers99 -

thanks mjrogers99,

@ PhilM

Try with mjrogers99’s suggestion and share with us about your result

Well, I’m very surprised … it seems to work!
I’ve put 5 seconds timeout (in ticks) for ExecutionConstraint.Install


private void listen()
        {
            bool doSleep = false;

            // Workaround pour debloquer le server.Accept
            ExecutionConstraint.Install((int)(new TimeSpan(0, 0, 5)).Ticks, 0);
            while (true)
            {

                try {
                    // Wait for a client to connect.
                    Socket newClient = mServer.Accept();

                    // Process the client request.  true means asynchronous.
                    ProcessClientRequest(newClient, false);
                    doSleep = false;
                } catch {
                    doSleep = true;
                }
                if (doSleep) {
                    Thread.Sleep(5000);  
                }
            }
        }        

It works, for now! I need to test it for longer, and as a client not just as server.
Thanks for the hint, and me too I am interested in the voodoo behind all this !

I’ve already read that MSDN article, but it is not clear what is an ‘operation’ from [quote]Creates a subthread within the calling thread, containing a constraint that requires the calling thread to complete an operation within a specified time period and at a specified priority level. [/quote]
I’ve tried to put a Thread.Sleep(15000), and it does NOT stop it at 5 seconds timeout…
So I’m not really sure how it works?

Well I tried this too … and it does not break it either:


int k =0;
for (int i=0;i<100000000;i++) { k++;}

So again I fail to see what defines an operation … and why it works for the socket problem?