EMX freezes after Garbage Collection

Hi

I have changed my code slightly to continuously loop and sleep for each loop iteration, also i not longer call the GC explicitly, it automatically gets called when the socket exception occurs (after each block of 41 iterations).

previously (without the sleep) the device would freeze every time with the socket exception (after 41 iterations). After adding the sleep it no longer freezes at that point however when continuing running i found that it does eventually still freeze after approx 1000 iterations. It does not freeze at the same iteration as previously but it still does eventually freeze. See my code below.


int i = 1;

            while (true)
            {
                Thread.Sleep(10);

                try
                {
                    Debug.Print("********** Socket Test " + i++ + " **********");

                    ExecutionConstraint.Install(500, 0);

                    ConnectSocket("192.168.0.18", 1234, false);
                    Thread.Sleep(10);

                    ExecutionConstraint.Install(-1, 0);
                }
                catch (ConstraintException)
                {
                    Debug.Print("********** Constaint Exception on Test " + i + " **********");
                }
               
            }
        }

what is more strange is that i changed my debug message to include date time, after making this change it WILL ALWAYS FREEZE on the socket exception (after 41 iterations) as previously. see the code below and run it to verify that the sleep does not fix in this situation.


 catch (ConstraintException)
                {
                    Debug.Print("********** Constaint Exception on Test " + i + " **********" + DateTime.Now.ToString("h:mm:ss.ff t"));
                }

It is quite random that this occurs and shows that the root caused of the problem has not yet been established.

Please note that we are preparing to deploy thousands of these devices on a very critical project and this issue is holding us up.

I’d be thinking about geting on the phone to GHI and pay for some assistance here, rather than hoping someone here can find a nugget…

@ Evan Khizkial

If you get any answers, please post updates here. We have over a hundred emx devices out at a customer site and it would be great to be able to sort out the freezing issue - currently watchdog kicks in and resets the devices, but would prefer for this to be the exception rather than the norm process.

Hi,

I have been testing the same code on 4.1 and found that the problem does not occur.

To recap, the code attempts to do a socket connection to an IP address that does not exist on the network. There is a Constraint which breaks the connection attempt after 500ms. The code continues the socket connection and constraint exception in a continuous loop.

Here is a summary of the differences.

  1. in version 4.2 a socket exception occurs after every 41st attempt i.e. attempt 41, 82, 121… etc. If there is no sleep in the code the socket exception causes the system to freeze.

  2. in version 4.1 a socket exception occurs after every 127th attempt, however the code never freezes even if there there is no call to Thread.Sleep();

Did your

private static Socket ConnectSocket(String server, Int32 port, bool lin

still have the

catch (Exception ex)
            {
                Debug.Print	("Exception !" +ex.Message);

                Debug.GC	( true );

                return (null);

            }


code inside.
what happens when the [quote]ConstraintException[/quote] is throwing just when the

[quote]Debug.GC[/quote] is running ? Maybe thomesing is happen to the GC

This question is more for the GHI guys now, so what are the next steps? there obviously is an issue with the GC in 4.2.9, with other customers implementing a watchdog to reset in the event of this happening,

Obviously placing a sleep “somewhere” in our code is not a long term or even short term fix, we are trying to get a commercial product out to market utilizing GHI and .net MF, with a large investment in both Engineering and EMX product orders

So I guess my next question is who do we need to contact and what do we need to do to get this resolved?

sorry for trying to help and interrest to find the problem.

This is already being taken care of. GHI keeps in eye on this very forum.

Hi Gus,

So when do you think you will be able to have any updates for us on the GC issues?

@ VB-Daniel sorry I think you mistook my post as I wasn’t referring to any previous posts just my one, I appreciate the communities input and help specially on an issue like this.

Updates always come regularly from GHI but we do not set dates. We like things done right than rushing to meet some preset dates.

Hi CureNET, welcome to the fourm (if you are actually a forum user - your post sounds some what spam-ish, I didn’t really understand what the relevance of the trips to the US was).

If you have an issue with EMX, and it sounds like you’re creating custom products, then I suggest you could reach out to GHI directly. Look up, the “Company” menu bar has a “Contact Us” link.

My personal view is that Gus is providing a level of detail and commitment that he can, at the point in time he made his comment. Like all software processes, trying to meet a firm deadline / release date jeopardises software quality, and he is taking the position that if no firm date is announced, and they will release the latest SDK when it meets their quality criteria.

Also, what’s important in this particular issue is that it was only a week ago when the first sample app that could reproduce this was made available, and GHI have been looking at this since. I am not faced with the same pressures you are, but I don’t think it’s unreasonable to take that long to properly diagnose something like this issue that involves networking and the core of the NetMF.

1 Like

Hi Gus/GHI

I just downloaded and tested the latest version of EMX firmware ( 4.2.10 ) with the same sample application and I got the same result still? in the release notes it says it has a fix for the GC crash but was that meant for this error?

And how about if you remove Debug.GC(true) when you catch its exception?

I don’t think you really need to call GC() in there.

HI Dat,

The issue is this program replicates the underlying issue where in our commercial program when GC occurs it will crash/freeze the EMX module, this has only happened since 4.2.9 but due to other issues in 4.1 and also that you cant over the wire upgrade to 4.2 we cannot roll back, basically we are now stuck with a unreliable and unstable GHI product.

We are having to meet our customer requirements and install the hardware but we going to be unable to so reliably.

To put it bluntly I need an answer within the next 2 days as to when/if this issue can be resolved and this will determine our next course of action.

@ TheScruba - this issue involves Microsoft so we can’t give you a definite date just yet. The release notes are updated with this known issue http://www.ghielectronics.com/docs/40/netmf-4.2-developer

We have reported this to Microsoft with possible fix https://netmf.codeplex.com/workitem/2002

Feel free to vote and reply to the above post. We will continue to work on this on our own as well. Note that adding sleep seem to work as a work around.

@ TheScruba - you have my vote :wink:

Edit:
I just made last test here and found out that max socket EMX can open is 32,
And usually a local variable will be destroyed if the function is finished.
But following the link, https://netmf.codeplex.com/workitem/2002
we can see ConnectSocket will be broken, even you can not debug in try catch, it also means the sockets object are still there.
While we are waiting for Microsoft, here are 2 solutions and hope they will work for you:

  • Do not create more than 32 sockets object by trying to use global variable. I highly recommend this solution.
  • Use the code below, it is same with the one yesterday I posted. I just add Debug.GC(true) at the beginning of function. Because the socket object will not be destroyed, call GC at the beginning of that function will destroyed them. I am testing this one and it passed more than 2000 times (of course, it is still running now).

private static Socket ConnectSocket(String server, Int32 port, bool linger)
{
	Debug.GC(true);
        Thread.Sleep(10); // need for GC();
	Socket socket = null;
	try
	{
		// Get server's IP address.
		DateTime dtTest = DateTime.Now;
		IPHostEntry hostEntry = Dns.GetHostEntry(server);
		Debug.Print("Resolved Host Address " + server + " " + DateTime.Now.Subtract(dtTest));
		// Create socket and connect to the server's IP address and port
		socket = new Socket(AddressFamily.InterNetwork,
								   SocketType.Stream, ProtocolType.Tcp); 
		dtTest = DateTime.Now;
		socket.Connect(new IPEndPoint(hostEntry.AddressList[0], port));
		Debug.Print("Socket Connected " + DateTime.Now.Subtract(dtTest));
		if ( linger )
		{
			socket.SetSocketOption(SocketOptionLevel.Socket
				   , SocketOptionName.DontLinger, true);
 
			socket.SetSocketOption(SocketOptionLevel.Tcp
				, SocketOptionName.NoDelay, true); 			
		}
		return socket;
	}
	catch (Exception ex)
	{
		Debug.Print	("Exception !" +ex.Message);
		socket = null;
		Debug.GC	( true );
		return		(null);
	}
}

These are the best we can do for you now.

Is there any update on this issue? We are also experiencing this problem with a live installation. The patience of our customer is wearing thin to say the least and one of the main problems is that we cannot provide any sort of estimate for a resolution date. This is have a significant negative impact on our reputation and a swift resolution is crucial to us.

We are starting to see also this problem without knowing what is the reason and how to reproduce it. For sure it will be disaster if it occurs more!

@ leforban -

Did you tried my post #38?

Basically do not create more than 32 sockets object it will be fine.