Network problems with G120 and ENC28

@ ChrisK - With the code listed I got errors if I tried to send requests too fast.

If a single client.send() is used per request then it worked.

What was the reason to use multiple sends?

The only guess I could make is that the G120 can’t keep up with the request when using multiple sends.

@ skeller: Sending in chunks is very important for me. The original solution is also compiled with .NET Micro Framework 4.1 for some FEZ Panda 2 using a FEZ Connect shield. The socket objects are wrapped using interfaces.
Because of the memory limitations of the Panda 2 I need to send the data in chunks to avoid a large string and a large buffer byte array.

Or you have to fork your code. As unpleasant as I know that is, it might be a necessary workaround until something else happens (like someone identifies the bug, if there is one, and logs it in either/both netmf/lwip’s tracking…

@ Brett: I will change the code and test it. If it works it’s a good workaround but it should be possible to use the send-method multiple times anyway.

Sorry but it doesn’t work!

The server socket always crahsed after 200-300 connections. Only one time the sockets accepts over 1,000 connections. It is very strange.

I also have tried different browsers but the result is the same.

Is there any other way to get more debug information, a memory dump, traces or is there any way to detect the broken socket state?

use MFDeploy for debugging instead of VS… mine gives out extra LWP errors that are not visible in VS…

@ ChrisK - I played with your code a little more but with no success. Looking at Wireshark it seems like the G120 is getting overwhelmed by request. I am using Chrome and holding F5 down. The web browser opens multiple threads when updating. In Wireshark some of those threads go up to 25 seconds with no response from the G120, when a ACK is sent the browser Resets the connection. Even when the G120 gets off the first group of data (“HTTP/2.1 200 OK\r\n”) the browser responds with a Reset causing the 2nd client.send() to fail.

The “#### SocketException ErrorCode = 10054” reflects that.
10054 = Connection reset by peer.
(Windows Sockets Error Codes (Winsock2.h) - Win32 apps | Microsoft Learn)

Why 4.1 works and 4.3 doesn’t I don’t know except it appears that 4.1 was always blocking on a socket.Send(). 4.3 it looks like it doesn’t block. That would tell me that there was probably a pretty big change in the stack between 4.1 and 4.3.

@ Jay Jay: Great idea! I have tested it with the attachted MFDeploy tool and get the following error directly before the server socket hangs at the accept-method forever.

I get this error two times.

I doesn’t get this error with Visual Studio. Is there anyone who can investigate this error?

Another thing I found is that the server accepts 4,257 connections if I change [quote]Program._serverSocket.Listen(10);[/quote] to [quote]Program._serverSocket.Listen(0);[/quote].

@ ChrisK: thanks for the hint. For me it is not an acceptable solution to provide a webserver without caching sockets, because each usual website uses a favicon, css files, images … which are loaded afterwards and these requests fail, because the browsers requests it fast and parallel.
But I tested it anyway. And it doesn’t work as expected.
As you can see in the test code there is an additional thread to test, whether the listener is still working, knowing, that it requires an additional socked.

First test:
not using the monitor thread, i.e. it’s the simplest (!) webserver I could imagine and I would expect that it runs. Otherwise NETMF is not an option for me, but I cannot believe, that we are the first one building a webserver and GHI will not find a solution. Is there nobody having a simple webserver up and running (than please post the code)?
Result: After some (not very fast) refreshes it hangs, maybe because the browser tries to get the icon…

Second test: If the monitor is active (interval = 1000ms), it seems to be working, but after some requests (~20) it hangs, too.

Third test: increasing the interval to 3000ms, no changes, especially if there are additional requests from the real browser (using chrome).

@ GHI: I investigated so much time and have a project, which absolutely requires a usual webserver. I tried some samples, tried gadgeteer, bought new hardware twice (!) (Panda2 -> G120 -> Raptor) and I cannot use it. Sorry, but I’m at a point where I want to throw it all away and start with some other hardware, but I already investigated so much time and money (my wife will kill me :frowning: , and hopefully she is not doing the next post :wink: )

Please, please help to get such a basic functionality up and running.

using System;
using System.Collections;
using System.Threading;
using Microsoft.SPOT;
using Microsoft.SPOT.Presentation;
using Microsoft.SPOT.Presentation.Controls;
using Microsoft.SPOT.Presentation.Media;
using Microsoft.SPOT.Presentation.Shapes;
using Microsoft.SPOT.Touch;

using Gadgeteer.Networking;
using GT = Gadgeteer;
using GTM = Gadgeteer.Modules;
using Gadgeteer.Modules.GHIElectronics;
using Microsoft.SPOT.Hardware;
using GHI.Pins;
using System.Net.Sockets;
using System.Net;

namespace HC
{
    public partial class Program
    {
        private DateTime StartTime;
        private int timeout = 5;
        private int count = 0;

        void ProgramStarted()
        {
            StartTime = DateTime.Now;

            Debug.Print("Program Started");
            ethernetENC28.NetworkInterface.EnableStaticIP("192.168.1.14", "255.255.255.0", "192.168.1.1");
            ethernetENC28.NetworkUp += ethernetENC28_NetworkUp;

            GT.Timer timer = new GT.Timer(1000);
            timer.Tick += MainThread;
            timer.Start();
        }

        void ethernetENC28_NetworkUp(GT.Modules.Module.NetworkModule sender, GT.Modules.Module.NetworkModule.NetworkState state)
        {
            Debug.Print("IP:" + ethernetENC28.NetworkSettings.IPAddress.ToString());

            StartTime = DateTime.Now;

            var workerThread = new Thread(this.Listen);
            workerThread.Start();

            //var monitorThread = new Thread(this.Monitor);
            //monitorThread.Start();
        }

        void MainThread(Gadgeteer.Timer timer)
        {
            //...
        }


        private void Listen()
        {
            const Int32 c_port = 80;
            Socket server = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            IPEndPoint localEndPoint = new IPEndPoint(IPAddress.Any, c_port);
            server.Bind(localEndPoint);
            server.Listen(0); // <---- changed to 0

            while (true)
            {
                Socket clientSocket = server.Accept();
                Debug.Print("Webserver Request (" + this.count++.ToString() + "): " + clientSocket.RemoteEndPoint);
                using (clientSocket)
                {
                    Byte[] buffer = new Byte[1024];
                    if (clientSocket.Poll(5000 * 1000, SelectMode.SelectRead))
                    {
                        if (clientSocket.Available != 0)
                        {
                            Int32 bytesRead = clientSocket.Receive(buffer, clientSocket.Available, SocketFlags.None);
                            TimeSpan ts = DateTime.Now - StartTime;
                            String s =
                                "HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=utf-8\r\n\r\n<html><head><title>.NET Micro Framework Web Server</title></head>" +
                               "<body><bold><h1>REAPTOR:<br>running since: " + ts.ToString() + "<br>Count: " + this.count.ToString() +"</h1></bold></body></html>";
                            clientSocket.Send(System.Text.Encoding.UTF8.GetBytes(s));
                        }
                        clientSocket.Close();
                    }
                }
                Debug.GC(true);
            }
        }


        private void Monitor()
        {
            TimeSpan timeout = new TimeSpan(0, 0, this.timeout);
            while (true)
            {
                Thread.Sleep(3000);
                var pingThread = new Thread(this.Ping);
                pingThread.Start();
                DateTime start = DateTime.Now;
                while ((DateTime.Now - start) < timeout && pingThread.IsAlive)
                {
                    Thread.Sleep(1000);
                }
                if (pingThread.IsAlive)
                {
                    Debug.Print(">>>>> REBOOT <<<<<");
                    //Microsoft.SPOT.Hardware.PowerState.RebootDevice(true);
                }
                else
                {
                    Debug.Print("NO PING TIMEOUT - " + this.count.ToString());
                }
            }
        }


        private void Ping()
        {
            string hostName = "127.0.0.1";
            int hostPort = 80;
            IPAddress host = IPAddress.Parse(hostName);
            IPEndPoint hostep = new IPEndPoint(host, hostPort);
            using (Socket socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp))
            {
                string GETrequest = "ping";
                socket.Connect(hostep);
                try
                {
                    socket.Send(System.Text.Encoding.UTF8.GetBytes(GETrequest));
                    if (socket.Poll(3000, SelectMode.SelectRead))
                    {
                        int requiredBufferSize = socket.Available;
                        if (requiredBufferSize != 0)
                        {
                            byte[] Buffer = new byte[requiredBufferSize];
                            int requestLength = socket.Receive(Buffer);
                            if (requestLength > 0)
                            {
                                char[] c = System.Text.Encoding.UTF8.GetChars(Buffer);
                                string text = new string(c);
                                Debug.Print(text);
                            }
                        }
                    }
                }
                catch
                {
                    Thread.Sleep(timeout * 2000);
                }
                socket.Close();
                Debug.GC(true);
            }
        }
    }
}

It was very quiet around the post; is still somebody there :slight_smile: ? Im still interested about a solution.

[em]Code update with much better but not satisfying[/em] result.

After comparing my code with NETMF sample socket server and ChrisK’s code I made some small changes:

  1. additional socket configuration (ChrisK)
  2. using a separate thread (as in the NETMF sample)
  3. changing the response content to use browser side auto refresh to eliminate monitor socket influences.

Detailed Tests and Result:

Test case 1: server.Listen(5);
After opening three browsers and calling the page, the listener hangs after about 60 calls.

Test case 2: server.Listen(0);
If visual studio is debugging the program and four browsers are running, the listener hangs quite fast (after ~50 calls).
If I close Visual Studio and press reset to restart the program without debugging, I can start 25 browsers and the webserver still responses, not fast, but the counter increases and the listener doesn’t hang. After a while there are the first timeouts for some browsers, but the webserver is still running. After 7 minutes, there are only 4 browsers without timeout. After 9 minutes there is only one browser left  .
When I do the same test with only 10 active browsers, after 1 minute one browser has a timeout, after 2 minutes only 8 browsers without timeout, after 3 minutes 4 browsers online, after 4 minutes 3 browsers, afterwards no changes and sill responses.
Last long time test: 5 browsers overnight: At the next day there are more than 200.000 responses counted and the webserver was still running, wow. Then I started an additional browser and did some additional requests. The listener hung again :frowning: .

Conclusion

  1. You should not do online debugging while testing. I expecting that ChrisK isn’t doing so. But it is not enough for a good webserver, because it has to process parallel and many requests, especially if a new webpage (including img, css, js and so on) is opened.
  2. The Socke.Listen has a quiet bad implementation, because the case that there are many parallel request does not have to kill it. I one of my colleague would write such code, I would kill him :slight_smile: . No, but I expect that it could be handled, especially because there is a debug output as ChrisK wrote.

Sorry for the honest feedback, but it should be constructively, not destructively. GHI is providing cool hardware, and hopefully support will provide a good solution to become able to use the NETMF sockets as expected.

Thanks for support!


        private void Listen()
        {
            Socket server = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            server.Bind(new IPEndPoint(IPAddress.Any, 80));
            server.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, true);
            server.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
            server.Listen(0);


            while (true)
            {
                Socket clientSocket = server.Accept();
                Debug.Print("Webserver Request (" + Program.count++.ToString() + "): " + clientSocket.RemoteEndPoint);
                new Worker(clientSocket, true);
                
            }
        }


        internal sealed class Worker
        {
            private Socket clientSocket;

            public Worker(Socket clientSocket, Boolean asynchronously)
            {
                this.clientSocket = clientSocket;
                if (asynchronously)
                    new Thread(ProcessRequest).Start();
                else ProcessRequest();
            }

            public void ProcessRequest()
            {
                const Int32 c_microsecondsPerSecond = 1000000;

                using (clientSocket)
                {
                    Byte[] buffer = new Byte[1024];
                    if (clientSocket.Poll(5 * c_microsecondsPerSecond, SelectMode.SelectRead))
                    {
                        if (clientSocket.Available != 0)
                        {
                            Int32 bytesRead = clientSocket.Receive(buffer, clientSocket.Available, SocketFlags.None);
                            TimeSpan ts = DateTime.Now - StartTime;
                            String s =
                                "HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=utf-8\r\n\r\n<html><head><title>.NET Micro Framework Web Server</title></head>" +
                                "<script type=\"text/JavaScript\">function timedRefresh() {window.location.reload();}</script>" +
                                "<body onload=\"JavaScript:window.setTimeout(timedRefresh, 1000);\"><bold><h1>RAPTOR:<br>running since: " + ts.ToString() + "<br>Count: " + Program.count.ToString() + "</h1></bold></body></html>";
                            clientSocket.Send(System.Text.Encoding.UTF8.GetBytes(s));
                        }
                        clientSocket.Close();
                    }
                }
                Debug.GC(true);
            }
        }


1 Like

It seems to “work”. My server handled 20,000 connections during the last days. The log was full with thousands of “tcp_pcb_purge” errors. I don’t know whether this is a problem or not!?

It crashed only one time while testing it with two different devices at the same time. The refresh interval was every second using 10 browser tabs at each device.

But a new problem is that not every connection is accepted. Without having a backlog I need to implement a retry behavior for the web apps. This is annoying.

@ bin-blank: I will also try to use a separate client thread and check whether more connections are being accepted.

I can confirm the behavior of your “Test case 2” (before disabling the backlog). It works for a whole day (Notebook polls every 15 seconds with one tab) but crashes directly after 3-5 connections from another device (WebApp using VPN) while the notebook is still polling.

Is here anyone from GHI who can help us?

@ ChrisK and bin-blank - We are working to improve network reliability but we don’t have anything just yet.

@ John: Is there anything we can do to support you?
Do you need more information about our network hardware and configuration? The hardware revisions of ENC28 etc. we use? Or anything else?

@ ChrisK - Are the tcp_pcb_purge errors the only ones you saw in the log?

@ John: I don’t know.
The log was full of the messages and I only periodically looked at it. I must clean it after every 30 minutes because MFDeploy was running slower and slower.

I can test it again later and search for other messages.

Status Update:
After two days it happened twice, that the listener hangs again after some hundreds of calls. That makes me frustrating. It’s not running for one whole day. Thus, setting the serverSocket.Listen(0); in not an solving the issue…

I was working at other projects the last couple of weeks. Now I working at the affected solution again and found that the problem still occures. This is very rare (every ~30,000 connections, sometimes only 12,000).

I updated to the latest version of the SDK (2014 R3) but this did not solve the problem.

I started to look at the NETMF source code by myself. The socket implementation polls with an infinite timeout within the accept method.

https://netmf.codeplex.com/SourceControl/latest#client_v4_3/Framework/Core/System/System/Net/Sockets/Socket.cs

It is possible to create an overload for the Accept method which supports adding a timeout value?



This makes it possible to restart the network shield or a full reboot of the G120.

Question or statement ?

People have said to use a separate thread on the incoming connection and a timer to kill the thread at your desired timeout.

@ ChrisK - I used the approach which @ Bret described.
It works for me.
Create a thread, which establishes the connection and sets a wait event when finished.
In the main thread wait for the event with a timeout.
If the timeout occurred, abort the thread.
Don’t forget to add try catch blocks to catch the tread aborted event.

I don’t have the code at hand, so I can not post it now.