I upgraded my Raptor board from MF 4.2 to MF 4.3, I have a new problem, my application runs ok for about 12 hours, then freezes. So far from reading other posts here and in http://netmf.codeplex.com/workitem/list/basic, I think it is caused by the NETMF network software locking up. I have a number of open TCP socket connections, opened forever, where data is continually received and sent to a PC. I have a socket server that accepts connections on five TCP port numbers. After the lock-up, there is no response to web server requests to the web server running on the Raptor board, or TCP port connection requests, or T43 touch screen presses, or KP16 keypad presses. Even the watchdog timer did not reboot the board. After I disconnect FEZConfig, Raptor does not respond to a ping from FEZConfig, FEZConfig cannot re-connect, but FEZConfig can reboot the board.
In the release notes for NETMF and Gadgeteer Package 2014 R1 - G400 v4.2.11.2, TinyBooter v4.2.11.2, there are these known issues:
Connect() and Accept() can sometimes hang forever. Use a wrapper class to check for this and act accordingly.
In v4.3, It looks to me that this issue is not resolved. When my program freezes, the Visual Studio threads report shows Socket.Accept() is blocked. Is this problem supposed to be fixed in MF 4.3? If not, can you tell me more about what you mean by using a wrapper class and check for this and act accordingly? Can you give me an example of this wrapper class?
PhilM concluded that ExecutionConstraint.Install() never throws the ConstraintException and the Socket.Accept() blocks and never accepts a connection. The problem was finally solved for PhilM when DAT of GHI sent PhilM a firmware update for his Cobra board -
Apparently this firmware fix never got into firmware 4.3 because I am now having the same problem as PhilM using 4.3 and a Raptor board. Does anyone know what happened to this fix mentioned on page 17 of the thread?
Is there a NETMF function to get the number of sockets and open file handles inside the framework? Or are you suggesting that I introduce a counter variable to count them? I know I have five server sockets open on which I call Accept(). Accept returns a client socket on which I send and receive data to the client. I am not opening any file handle in my code. I will post my socket server code if you want to see it.
When it freezes, the Visual Studio threads report shows in the call stack that five threads are in the Accept() function, and sometimes on thread is inside the client processing thread. Only the name of the client thread function is given in the call stack, not which MF function was called last. It could be Socket.Poll() or Socket.Receive()
Sorry for the poor english grammar, I meant to say:
When it freezes, the Visual Studio threads report shows in the call stack that five server threads are in the Accept() function, and sometimes the client processing thread is running. Only the name of the client thread function is given in the call stack, not which MF function was called last. It could be Socket.Poll() or Socket.Receive().
Does anyone know if the firmware fix mentioned in https://www.ghielectronics.com/community/forum/topic?id=12342&page=17
made it into MF 4.3 firmware for Raptor? Or is GHI working in putting in the fix? The 4.3 release notes say use a wrapper class, but all of the wrappers around Socket.Accept() mentioned in this discussion that were tried, failed to solve the problem.
@ dspacek - That fix should have made it into 4.3, yes. As for the known issue referencing connect and accept, it means that there is no timeout for connect and accept. So if you never get a remote endpoint connecting to you, accept will never return so you need to wrap that call in a thread that you can abort. It’s more of a design limitation of NETMF sockets than a bug. Can you try to have only one thread calling accept instead of five?
I think this means that the fix for the Socket.Accept() hanging problem is not in 4.3 yet and GHI is working on it.
I did try putting a wrapper thread around Socket.Accept(), and it does not work, it throws a series of exceptions. Every time I call Accept(), an exception is thrown immediately.
Here is my wrapper function. It aborts Accept() every 15 seconds and calls it again. When a client computer makes a connection, TimedListen should return the client socket. But no client is making a connection, it aborts in 15 seconds and never works again.
I tried another version of the wrapper function, I don’t try to re-start the server socket. It throws ThreadAbort exception every 15 seconds. But when Accept() locks up, and the ThreadAbort exception is thrown, and the TimedListen() function is called again, the web browser cannot make a connection again.
2014/09/05 19:51:55.785 Web server started at: 192.168.8.239
2014/09/05 19:51:55.787 Web Server started
2014/09/05 19:51:55.789 Waiting for valid DeviceHive IP address set in flash memory
The thread ‘’ (0x3) has exited with code 0 (0x0).
#### Exception System.Threading.ThreadAbortException - 0x00000000 (20) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::poll [IP: 0000] ####
#### System.Net.Sockets.Socket::Poll [IP: 0011] ####
#### System.Net.Sockets.Socket::Accept [IP: 0017] ####
#### XGadgeteer.Networking.WebServerManager+Server+<>c__DisplayClass1::b__0 [IP: 000a] ####
A first chance exception of type ‘System.Threading.ThreadAbortException’ occurred in Microsoft.SPOT.Net.dll
TimedListen: System.Threading.ThreadAbortException
#### Exception System.Threading.ThreadAbortException - 0x00000000 (20) ####
#### Message:
#### XGadgeteer.Networking.WebServerManager+Server+<>c__DisplayClass1::b__0 [IP: 001f] ####
A first chance exception of type ‘System.Threading.ThreadAbortException’ occurred in XGadgeteer.WebServer.dll
I tried another work-around idea posted in this thread: https://www.ghielectronics.com/community/forum/topic?id=12342&page=4
This does not work for me. Accept() never returns until a web browser connects. ExecutionConstraint.Install() does not cause an exception either. A few HTML pages later, Accept() never returns and no more connections, Accept() is frozen and ExecutionConstraint.Install() does not fix the problem.
I ran PhilM’s code (of course with some modifies because his code is 4.2).
There was no problem found. But we just test in short time.
Below is a simple example that provides 3 different ways to call socket.Accept(). Can you please give us a try and let us know which way is better for you.
Program.cs
using System;
using System.IO;
using System.Text;
using System.Threading;
using System.Net;
using System.Net.Sockets;
using System.Security;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
using Microsoft.SPOT.Net;
using Microsoft.SPOT.Net.NetworkInformation;
using GHI.Networking;
namespace Networking
{
public class Program
{
static EthernetENC28J60 netif;
const int RAPTOR_PIN6_SOCKET1 = (1 * 32) + 5; // PB5
const int RAPTOR_PIN3_SOCKET1 = (1 * 32) + 0; // PB0
const int RAPTOR_PIN4_SOCKET1 = (0 * 32) + 7; // PA7
const int BUFFER_SIZE = 1 * 1024;
static OutputPort led = new OutputPort((Cpu.Pin)(3 * 32 + 3), true); // led pd3
static InputPort ldr1 = new InputPort((Cpu.Pin)(4), false, Port.ResistorMode.PullUp);
static bool isNetworkReady = false;
static byte[] buffer1 = new byte[2*1024];
static byte[] buffer2;
public static void Main()
{
while (ldr1.Read() == true)
{
led.Write(!led.Read());
Thread.Sleep(500);
}
//netif = new EthernetENC28J60(SPI.SPI_module.SPI2, (Cpu.Pin)86, (Cpu.Pin)95, (Cpu.Pin)5); // socket 6 g400HDR
netif = new EthernetENC28J60(SPI.SPI_module.SPI2, (Cpu.Pin)RAPTOR_PIN6_SOCKET1, (Cpu.Pin)RAPTOR_PIN3_SOCKET1, (Cpu.Pin)RAPTOR_PIN4_SOCKET1); // socket 1 Fez Raptor
NetworkChange.NetworkAvailabilityChanged += new NetworkAvailabilityChangedEventHandler(NetworkChange_NetworkAvailabilityChanged);
NetworkChange.NetworkAddressChanged += new NetworkAddressChangedEventHandler(NetworkChange_NetworkAddressChanged);
if (netif.IsDhcpEnabled == false)
{
netif.EnableDhcp();
netif.EnableDynamicDns();
}
netif.Open();
Debug.Print("ENC28 Opened.");
int cnt = 0;
while (isNetworkReady == false)
{
led.Write(!led.Read());
Thread.Sleep(100);
Debug.Print("Wait for network ready. " + (cnt++));
}
Debug.Print("Start local sever");
new Thread(LedThread).Start();
StartSever();
Thread.Sleep(Timeout.Infinite);
}
static void LedThread()
{
while (true)
{
led.Write(!led.Read());
Thread.Sleep(50);
}
}
static void NetworkChange_NetworkAvailabilityChanged(object sender, NetworkAvailabilityEventArgs e)
{
Debug.Print("Network has changed! ");
}
static void NetworkChange_NetworkAddressChanged(object sender, EventArgs e)
{
Debug.Print("New address for the Network Interface ");
Debug.Print("Is DhCp enabled: " + netif.NetworkInterface.IsDhcpEnabled);
Debug.Print("Is DynamicDnsEnabled enabled: " + netif.NetworkInterface.IsDynamicDnsEnabled);
Debug.Print("NetworkInterfaceType " + netif.NetworkInterface.NetworkInterfaceType);
Debug.Print("Network settings:");
Debug.Print("IP Address: " + netif.NetworkInterface.IPAddress);
Debug.Print("Subnet Mask: " + netif.NetworkInterface.SubnetMask);
Debug.Print("Default Gateway: " + netif.NetworkInterface.GatewayAddress);
for (int i = 0; i < netif.NetworkInterface.PhysicalAddress.Length; i++)
{
Debug.Print(" " + netif.NetworkInterface.PhysicalAddress[i].ToString());
}
Debug.Print("Number of DNS servers:" + netif.NetworkInterface.DnsAddresses.Length);
for (int i = 0; i < netif.NetworkInterface.DnsAddresses.Length; i++)
Debug.Print("DNS Server " + i.ToString() + ":" + netif.NetworkInterface.DnsAddresses[i]);
Debug.Print("------------------------------------------------------");
if (netif.IPAddress != "0.0.0.0")
{
isNetworkReady = true;
}
}
static void StartSever()
{
IPEndPoint localEndPoint = new IPEndPoint(IPAddress.Any, 80);
MySocket mysocket = new MySocket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
mysocket.TimeOutConnection = 15000;
mysocket.Bind(localEndPoint);
mysocket.Listen(Int32.MaxValue);
while (true)
{
//if (mysocket.Accept(localEndPoint) == true) // by thread
//if (mysocket.Accept(localEndPoint, true) == true) // by ExecutionConstraint
if (mysocket.Accept(localEndPoint, mysocket.TimeOutConnection) == true) // by poll
{
ProcessClientRequest(mysocket.GetSocketAccept);
}
else
{
Debug.Print("No request, start new thread!");
}
}
}
static void ProcessClientRequest(Socket m_clientSocket)
{
const Int32 c_microsecondsPerSecond = 1000000;
// 'using' ensures that the client's socket gets closed.
using (m_clientSocket)
{
// Wait for the client request to start to arrive.
if (m_clientSocket.Poll(5 * c_microsecondsPerSecond,
SelectMode.SelectRead))
{
// If 0 bytes in buffer,
// then the connection has been closed,
// reset, or terminated.
if (m_clientSocket.Available == 0)
return;
// Read the first chunk of the request
// (we don't actually do anything with it).
Int32 bytesRead = m_clientSocket.Receive(buffer1,
m_clientSocket.Available, SocketFlags.None);
String s = "Hello world! Now is " + DateTime.Now;
s = "HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=utf-8\r\nContent-Length: " + s.Length + "\r\n\r\n" + s;
buffer2 = System.Text.Encoding.UTF8.GetBytes(s);
int offset = 0;
int ret = 0;
int len = buffer2.Length;
while (len > 0 )
{
ret = m_clientSocket.Send(buffer2, offset, len,
SocketFlags.Truncated);
len -= ret;
offset += ret;
}
m_clientSocket.Close();
}
}
}
}
}
I tried running the code suggested here by GHI user Dat, item #11, and shown below: https://www.ghielectronics.com/community/forum/inbox/message?id=16586&page=2
I call this function: Accept(EndPoint ep). This function tries to work around the problem that Socket.Accept() freezes, by putting Socket.Accept() in a thread and aborting the thread after a time-out. A forum member claims that this solves the problem, but it does not work for me. It appears to solve the problem if you run it under limited traffic for a short time. I used the network stress tester client found on the Codeshare https://www.ghielectronics.com/community/codeshare/entry/780. To simulate my needed usage, I started eight instances of the stress tester client, and made connections to server ports 8080, 8081, … 8087. It ran for about two hours, then connections could not be made.
I see that the R4 SDK did not have a fix for this problem. I am concerned that the problem is not even acknowledged by GHI, because everyone tells me to use the “wrapper” work-around, and so the problem does not exist. But the wrapper work-around does not work in real-world usage.
Could someone at GHI please acknowledge that there is a network Socket lock-up problem, and tell me if GHI or Microsoft is trying to fix it?
using System;
using System.IO;
using System.Text;
using System.Threading;
using System.Net;
using System.Net.Sockets;
using System.Security;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
using Microsoft.SPOT.Net;
using Microsoft.SPOT.Net.NetworkInformation;
using GHI.Networking;
namespace Networking
{
public class MySocket
{
Socket socket;
public Socket clientSocket;
private Thread threadAccept;
private EndPoint remoteEP;
private Boolean isSocketAccepted = false;
private int connectTimeOut = 15000; // 15 second
public MySocket(AddressFamily addressFamily, SocketType socketType, ProtocolType protocolType)
{
try
{
socket = new Socket(addressFamily, socketType, protocolType);
}
catch
{
Debug.Print("No more socket available!");
}
}
public void Dispose()
{
try
{
if (clientSocket != null)
{
clientSocket.Close();
clientSocket = null;
}
if (socket != null)
{
socket.Close();
socket = null;
}
}
catch
{
Debug.Print("Error while closing these sockets. Trying to set null");
clientSocket = null;
socket = null;
}
}
~MySocket()
{
this.Dispose();
}
public int TimeOutConnection
{
get { return connectTimeOut; }
set { connectTimeOut = value; }
}
private void CancelSocketAccept()
{
if (threadAccept != null)
{
try
{
if (threadAccept.IsAlive || threadAccept.ThreadState == ThreadState.Running)
{
try
{
threadAccept.Suspend();
threadAccept.Abort();
}
catch
{
}
}
threadAccept = null;
}
catch
{
// Not important msg
}
}
}
public Socket GetSocketAccept
{
get
{
return clientSocket;
}
}
private void CloseClientSocket()
{
if (clientSocket != null)
{
try
{
clientSocket.Close();
clientSocket = null;
}
catch
{
Debug.Print("Error while closing client socket. Trying to set null");
clientSocket = null;
}
}
}
public Boolean Accept(EndPoint ep, int polltimeout, bool keepsocketclientopen)
{
if (clientSocket != null)
{
try
{
clientSocket.Poll(polltimeout * 1000, SelectMode.SelectRead);
if (clientSocket.Available == 0)
{
return Accept(ep, polltimeout);
}
else
{
return true;
}
}
catch
{
return Accept(ep, polltimeout);
}
}
else
{
return Accept(ep, polltimeout);
}
}
public Boolean Accept(EndPoint ep, int polltimeout)
{
remoteEP = ep;
isSocketAccepted = false;
CloseClientSocket();
try
{
if (socket.Poll(polltimeout * 1000, SelectMode.SelectRead) == true)
{
Accept();
}
}
catch
{
}
return isSocketAccepted;
}
public Boolean Accept(EndPoint ep, Boolean setupexecutionconstraint)
{
remoteEP = ep;
isSocketAccepted = false;
CloseClientSocket();
try
{
ExecutionConstraint.Install(connectTimeOut, 0);
Accept();
ExecutionConstraint.Install(-1, 0);
}
catch
{
ExecutionConstraint.Install(-1, 0);
}
return isSocketAccepted;
}
public Boolean Accept(EndPoint ep)
{
remoteEP = ep;
isSocketAccepted = false;
CloseClientSocket();
threadAccept = new Thread(Accept);
threadAccept.Start();
int timeout = connectTimeOut;
while (timeout > 0)
{
Thread.Sleep(10);
timeout -= 10;
if (isSocketAccepted == true)
break;
}
if (timeout == 0) // expired
{
CancelSocketAccept();
}
return isSocketAccepted;
}
public void Bind(EndPoint ep)
{
remoteEP = ep;
socket.Bind(remoteEP);
}
public void Listen(int listen)
{
socket.Listen(listen);
}
private void Accept()
{
try
{
clientSocket = socket.Accept();
}
catch
{
clientSocket = null;
}
isSocketAccepted = clientSocket!= null? true: false;
}
// Connect
}
}
@ dspacek - I am not sure if this is related but there is a known internal issue causing the network to internally slow down, probably an overflow of events. We are working in this issue full time as we speak.
@ dspacek - which is why I said I am not sure is related. Networking things are all intertwined internal so everything effects everything else! There should be a testing firmware that will be released in couple weeks, maybe sooner.
I posted complete instructions and sample code for you to run on your Raptor board and use the network stress tester the same way I did. Is not that preferable? You can make the problem happen in your lab. Let it run with eight instances of the stress tester, each connected to a TCP port 8080 to 8087. It will freeze in a few hours. Use a Layer 3 Ethernet switch. Please see my post item #16.