Last attempt at solving a socket.Write error

Hello all, I have been stymied by this problem for way too long and need to resolve it or abandon the platform. I am going to be very verbose in this post so if you are not up for the ride…
The problem I am experiencing happens on a Cerberus, Hydra and Spyder. I have them all using 4.2.

I have an application that is reading a serial port using the rs232 module. The messages are 8 bytes long and they come every 200 ms.
I have an object that aggregates those messages and generates an XML string out of the messages.
I have a timer that every second calls for the XML and writes it to a socket connection.

The program runs for some random amount of time @ 1 or 2 hours and then just stops in the middle of writing the xml to the socket, and by that I mean a random length fragment of the xml document was received by the client. There are no errors, there are no trappable exceptions. The entire thing is hung until I reset. I cannot connect with mfDeploy, I cannot reconnect a new socket, and I cannot try to deploy/debug without a hard (push the button) reset.

I have isolated and run the rs232 module reading the data for 24 hours and it did not fail.
I have Isolated the xml generation code and ran it for 48 hours and it did not fail.
I have isolated the Socket and sent data over it for 24 hours and it did not fail.

When I combine the RS232 and the Socket I get a failure after @ 4 hours.

things I have tried with no effect:
[ul]multiple timers (function specific)
a single timer (and counted ticks for longer things)
multiple threads with while/sleep combinations
a single thread with a while/sleep combination[/ul]

I have thinned the code down to a single page that still fails. I have removed all of the code that handles anything extraneous (heartbeat, socket state etc.). I do not know what else to isolate nor do I know what else to try.

Here is the code:

using System;
using System.IO.Ports;
using System.Net;
using System.Net.Sockets; 
using System.Text;
using System.Threading;
using Microsoft.SPOT;
using Microsoft.SPOT.Net.NetworkInformation;
using GT = Gadgeteer;
using Serial = Gadgeteer.Interfaces.Serial;
using Gadgeteer.Modules.GHIElectronics;

namespace NetworkAppliance.Hardware.Cerberus
{
    public partial class Program
    { 
        private static RS232 _rs232;
        private int _byteIndex;
        private byte[] _currentMessage; 
        private readonly NetworkInterface _networkInterface = NetworkInterface.GetAllNetworkInterfaces()[0];
        private readonly Gadgeteer.Timer _shortTimer = new GT.Timer(1000);
        
        private Socket _hostSocket;
        private Socket _clientSocket;
        private Thread _listener; 

        void ProgramStarted()
        {
            Debug.Print("Program Started"); 
            _rs232 = new RS232(2);
            _rs232.Initialize(19200, Serial.SerialParity.None, Serial.SerialStopBits.None, 8, Serial.HardwareFlowControl.NotRequired);
            _rs232.serialPort.Open();
            _rs232.serialPort.DataReceived += new Serial.DataReceivedEventHandler(SerialPort_DataReceived);
            
            _networkInterface.EnableDhcp();
            while (_networkInterface.IPAddress == "0.0.0.0")
            {
                Debug.Print("Awaiting IP Address");
                Thread.Sleep(1000);
            }

            Debug.Print(
                "DCHP - IP Address = " + _networkInterface.IPAddress + " ... Net Mask = " +
                _networkInterface.SubnetMask + " ... Gateway = " + _networkInterface.GatewayAddress);
             
            var localEndPoint = new IPEndPoint(IPAddress.Any, 60504);
            _hostSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            _hostSocket.Bind(localEndPoint);
            _hostSocket.Listen(1);

            _listener = new Thread(ListenerLoop);
            _listener.Start();

            _shortTimer.Tick+=new GT.Timer.TickEventHandler(ShortTimer_Tick);
            }

        #region Timer Loop
        private void ShortTimer_Tick(GT.Timer timer)
        {
            Debug.Print("Tick: " + DateTime.Now.ToString("%h:mm:ss.fff"));
            OneSecondTick();
        }

        private void OneSecondTick()
        {
            try
            { 
                if (_clientSocket != null)
                {
                    Debug.Print("Calling Socket Send: " + DateTime.Now.ToString("%h:mm:ss.fff"));
                    //this string is representative of the xml that is being generated
                    var message = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n<AccutechMessaging xmlns=\"http://www.Accutech.com\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" MessageTime=\"2011-06-01T00:25:48\">\r\n   <SecurityEvents>\r\n      <TagEvents>\r\n         <TagEvent>\r\n            <Tag>\r\n               <TagId>12</TagId>\r\n               <BatteryStrength>LowBatteryStrength</BatteryStrength>\r\n            </Tag>\r\n            <Zone>\r\n               <ZoneId>4</ZoneId>\r\n            </Zone>\r\n            <EventTypes>\r\n               <EventType>ExitPointAlarm</EventType>\r\n               <EventType>Loiter</EventType>\r\n            </EventTypes>\r\n         </TagEvent>\r\n         <TagEvent>\r\n            <Tag>\r\n               <TagId>2</TagId>\r\n               <BatteryStrength>LowBatteryStrength</BatteryStrength>\r\n            </Tag>\r\n            <Zone>\r\n               <ZoneId>13</ZoneId>\r\n            </Zone>\r\n            <EventTypes>\r\n               <EventType>ExitPointAlarm</EventType>\r\n               <EventType>Loiter</EventType>\r\n            </EventTypes>\r\n         </TagEvent>\r\n      </TagEvents>\r\n      <ZoneEvents>\r\n         <ZoneEvent>\r\n            <Zone>\r\n               <ZoneId>4</ZoneId>\r\n            </Zone>\r\n            <EventTypes>\r\n               <EventType>DoorAjar</EventType>\r\n         </EventTypes>\r\n         </ZoneEvent>\r\n         <ZoneEvent>\r\n            <Zone>\r\n               <ZoneId>13</ZoneId>\r\n            </Zone>\r\n            <EventTypes>\r\n               <EventType>DoorAjar</EventType>\r\n         </EventTypes>\r\n         </ZoneEvent>\r\n      </ZoneEvents>\r\n   </SecurityEvents>\r\n</AccutechMessaging>\r\n\r\n";
                    //******************fails in the middle of the next call****************************
                    _clientSocket.Send(Encoding.UTF8.GetBytes(message));
                    Debug.Print("Completed Socket Send: " + DateTime.Now.ToString("%h:mm:ss.fff"));
                }  
            }
            catch (Exception ex) // this never gets called even in failure
            {
                Debug.Print(ex.Message);
                throw;
            }
        }
        #endregion

        #region Socket Methods
        private void ListenerLoop()
        {
            while (_clientSocket == null) // exit method after socket is found
            {
                    Debug.Print("listening...");
                    _clientSocket = _hostSocket.Accept();
                    Debug.Print("Accepted Port 60504 connection from " + _clientSocket.RemoteEndPoint); 
                    _shortTimer.Start();
            }
        }
        #endregion

        #region SerialPort Methods
        void SerialPort_DataReceived(Serial sender, SerialData data)
        {
            var bytesToReceive = sender.BytesToRead;
            var bytes = new byte[bytesToReceive];
            sender.Read(bytes, 0, bytesToReceive);
            ProcessByte(bytes);
        }

        private void ProcessByte(byte[] data)
        {
            foreach (var currentByte in data)
            {
                if (currentByte == 0X0A) // dump any in flight message and start new
                {
                    _byteIndex = 0;
                    _currentMessage = new byte[32];
                }
                if (_currentMessage != null)
                {
                    _currentMessage[_byteIndex] = currentByte;
                    _byteIndex++;
                    if (currentByte == 0X0D)
                    {
                        var passingMessage = new byte[_byteIndex];
                        Array.Copy(_currentMessage, passingMessage, _byteIndex);
                        HandleMessage(passingMessage, DateTime.Now); 
                        _currentMessage = null;
                    }
                }
            }
        }
        
        private static void HandleMessage(byte[] bytes, DateTime expireTime)
        {
            // do nothing -- this would be a call to put the bytes into an object that converts and aggregates them into an xml stream.
        }
        #endregion
         
    }
}

Any help would be greatly appreciated

Tal

1 Like

Hi,
can it be that the problem occurs when the event SerialPort_DataReceived comes multiple times before the handling of the data in the routine ProcessByte
is finished. So you write on the variables _byteIndex and _currentMessage at the same time from different places. I would use a counting variable to see, whether the error only occurs when there are multiple events to be handled at the same time. Perhaps you should block further serialPort_DataReceived events until the other work is done.
Roland

Thanks for the suggestion, However, I believe that the serialport already has internal blocking that stops the event from multiple firing until the buffer is cleared. If it did not do this under the covers, then there would be an event for every single byte that arrived. Please correct me if this is not the case.

Tal

I just write a PC application in C#, where i receive data from a spider over bluetooth an a virtual serial port on the PC. Indeed there are multiple events. You can try it out with a global integer variable that you increment when you enter the event routine, write to Debug.print and decrement when you leave the routine.
Roland

Edit: You are right, there will be no events before the buffer is cleared, but there may be new events for incoming data before the last data are processed.

I have made these Changes as you have suggested

     void SerialPort_DataReceived(Serial sender, SerialData data)
        {
            _receivedCounter++;
            Debug.Print(_receivedCounter.ToString());
            var bytesToReceive = sender.BytesToRead;
            var bytes = new byte[bytesToReceive];
            sender.Read(bytes, 0, bytesToReceive);
            ProcessByte(bytes);
        }

        private void ProcessByte(byte[] data)
        {
            foreach (var currentByte in data)
            {
                if (currentByte == 0X0A) // dump any in flight message and start new
                {
                    _byteIndex = 0;
                    _currentMessage = new byte[32];
                }
                if (_currentMessage != null)
                {
                    _currentMessage[_byteIndex] = currentByte;
                    _byteIndex++;
                    if (currentByte == 0X0D)
                    {
                        var passingMessage = new byte[_byteIndex];
                        Array.Copy(_currentMessage, passingMessage, _byteIndex);
                        HandleMessage(passingMessage, DateTime.Now); 
                        _currentMessage = null;
                    }
                }
            }
            _receivedCounter--;
        }

oddly it now fails in only a minute or 2. the counter remained at 1.

Also, I am seeing a new line of:
“The thread ‘’ (0x37a0) has exited with code 0 (0x0).” (different ID each time)
in my debug window every 30-60 seconds so some part of the system is not dead, just not the main thread.

When I tried to put a breakpoint in the Data_Received method to see if anything was happening I got this:

Any more ideas?

tal

Not really, but it´s interesting. Perhaps I try the code on my spider.
Roland

I am currently trying the serial reading with a While loop on a separate thread instead of the Data_Received event

Here is a simple console app that feeds a com port the serial data:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO.Ports;
using System.Threading;
using System.Timers;
using Timer = System.Timers.Timer;

namespace TestSerialPort
{
    class Program
    {

        private static SerialPort _comPort;

        private static readonly Timer ShortTimer = new Timer(200);
        private static readonly Timer LongTimer = new Timer(5000);
        private static readonly string HubPulseText = "\nHP0H\r";
        private static byte[] _hubPulseBytes;
        private readonly static string TagMessageText = "\n00050K\r";
        private readonly static string StatusMessegeText = "\n10066C\r";
        private static byte[] _tagMessageBytes;
        private static byte[] _statusMessegeBytes;
        private static bool Toggle;



        static void Main(string[] args)
        {
            _comPort = new SerialPort("COM24", 19200, Parity.None, 8, StopBits.One);
            _comPort.Open();
            ShortTimer.AutoReset = true;
            ShortTimer.Elapsed += new ElapsedEventHandler(ShortTimer_Elapsed);

            LongTimer.AutoReset = true;
            LongTimer.Elapsed += new ElapsedEventHandler(LongTimer_Elapsed);

            ShortTimer.Enabled = true;
            LongTimer.Enabled = true;


            _hubPulseBytes = Encoding.ASCII.GetBytes(HubPulseText);
            _tagMessageBytes = Encoding.ASCII.GetBytes(TagMessageText);
            _statusMessegeBytes = Encoding.ASCII.GetBytes(StatusMessegeText);


            while (true)
            { }
        }

        private static void LongTimer_Elapsed(object sender, ElapsedEventArgs e)
        {
            WriteBytes(_hubPulseBytes);
        }

        private static void ShortTimer_Elapsed(object sender, ElapsedEventArgs e)
        {
            WriteBytes(Toggle ? _tagMessageBytes : _statusMessegeBytes);
            Toggle = !Toggle;
        }


        private static void WriteBytes(byte[] bytes)
        {
            _comPort.Write(bytes, 0, bytes.Length);
           Console.WriteLine(Encoding.ASCII.GetString(bytes));
        }
    }
}

Thanks

Using serial port events the way you have usually isn’t the best way to handle this. If you use an event, just get in and read the data and put it in a buffer. Then, exit the handler as fast as possible. You should then have a different part of the code that isn’t influenced by the handler deal with the de-queue and processing. Your while loops scenario might actually work out to be better than the original.

Also, I’d be explicit; instead of

while (true)
            { }

I’d use

Thread.Sleep(Timeout.Infinite);

@ Brett, you are right about the Timeout.Infinite…I was in a hurry.

The spinning data read failed as well. It was:


        private void SerialPortSpinner()
        {
            while (_rs232 != null && _rs232.serialPort != null)
            {
                lock (_rs232)
                {
                    var bytesToReceive = _rs232.serialPort.BytesToRead;
                    if (bytesToReceive > 0)
                    {
                        var bytes = new byte[bytesToReceive];
                        _rs232.serialPort.Read(bytes, 0, bytesToReceive);
                        ProcessByte(bytes);
                    }
                }
                Thread.Sleep(100);
            }
        }

I do not think the problem lies with the Serial port.

Other Ideas? Bueller…Bueller

I do not know if this is important but in the intialization of your port you use

_rs232.Initialize(19200, Serial.SerialParity.None, Serial.SerialStopBits.None, 8, Serial.HardwareFlowControl.NotRequired);

are you sure that StopBits.None is ok.
If I try this, the RS232 Module throws an exception.
Roland

“COM24”?

@ RoSchmi are you using the Gadgeteer.Interfaces.Serial.SerialStopBits ?

@ Mike COM24 is the port my USB to Serial converter is on. You would need to change that bit to whatever you were outputting on.

@ Tal_McMahon - I tried to run your code last night but it got to late to get it ready. I installed the RS232 Module via the Designer UI. When I initialized the RS232 Module with your setting I got a exception which was, as I think, due to StopBits.None. So I thought, that this could perhaps be the reason for your issue.
Roland

@ RoSchmi I did not use the designer for anything. I was eliminating any magic code that may have been under the covers.

  _rs232 = new RS232(2);
            _rs232.Initialize(19200, Serial.SerialParity.None, Serial.SerialStopBits.None, 8, Serial.HardwareFlowControl.NotRequired);
            _rs232.serialPort.Open();

As you can see the RS232 is instantiated with the Param of “2” which is the socket it is plugged into.

I am not an expert with RS232, perhaps one of the other forum members could give an advice.
But in your application for the PC to feed the serial port you use a setting with one stop bit:

static void Main(string[] args)
{
_comPort = new SerialPort(“COM24”, 19200, Parity.None, 8, StopBits.One);
_comPort.Open();

why not in the receiver? I would alway use the same settings in sender and receiver, but may be it works too otherwise.
Roland

Thanks for the catch. I believe it is a copy and paste error introduced in the sample code I gave. I do not think that it matters to my grander problem of the system failing.

Edit. – Yes it was introduced in the sample, In my original code it was 1.

O.K. I can rule out the rs232 code, I got the system to fail with only the socket running. SO one more time here is the code as it is running, and failing.

using System;
using System.IO.Ports;
using System.Net;
using System.Net.Sockets; 
using System.Text;
using System.Threading;
using Microsoft.SPOT;
using Microsoft.SPOT.Net.NetworkInformation;
using GT = Gadgeteer;
using Serial = Gadgeteer.Interfaces.Serial;
using Gadgeteer.Modules.GHIElectronics;

namespace NetworkAppliance.Hardware.Cerberus
{
    public partial class Program
    { 
        private readonly NetworkInterface _networkInterface = NetworkInterface.GetAllNetworkInterfaces()[0];
        private readonly Gadgeteer.Timer _shortTimer = new GT.Timer(1000);
        
        private Socket _hostSocket;
        private Socket _clientSocket;
        private Thread _listener;

        void ProgramStarted()
        {
            Debug.Print("Program Started"); 
            _networkInterface.EnableDhcp();
            while (_networkInterface.IPAddress == "0.0.0.0")
            {
                Debug.Print("Awaiting IP Address");
                Thread.Sleep(1000);
            }

            Debug.Print(
                "DCHP - IP Address = " + _networkInterface.IPAddress + " ... Net Mask = " +
                _networkInterface.SubnetMask + " ... Gateway = " + _networkInterface.GatewayAddress);
             
            var localEndPoint = new IPEndPoint(IPAddress.Any, 60504);
            _hostSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            _hostSocket.Bind(localEndPoint);
            _hostSocket.Listen(1);

            _listener = new Thread(ListenerLoop);
            _listener.Start();

            _shortTimer.Tick+=new GT.Timer.TickEventHandler(ShortTimer_Tick);
            }

        #region Timer Loop
        private void ShortTimer_Tick(GT.Timer timer)
        {
            Debug.Print("Tick: " + DateTime.Now.ToString("%h:mm:ss.fff"));
            OneSecondTick();
        }

        private void OneSecondTick()
        {
            try
            { 
                if (_clientSocket != null)
                {
                    Debug.Print("Calling Socket Send: " + DateTime.Now.ToString("%h:mm:ss.fff"));
                    //this string is representative of the xml that is being generated
                    var message = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n<AccutechMessaging xmlns=\"http://www.Accutech.com\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" MessageTime=\"2011-06-01T00:25:48\">\r\n   <SecurityEvents>\r\n      <TagEvents>\r\n         <TagEvent>\r\n            <Tag>\r\n               <TagId>12</TagId>\r\n               <BatteryStrength>LowBatteryStrength</BatteryStrength>\r\n            </Tag>\r\n            <Zone>\r\n               <ZoneId>4</ZoneId>\r\n            </Zone>\r\n            <EventTypes>\r\n               <EventType>ExitPointAlarm</EventType>\r\n               <EventType>Loiter</EventType>\r\n            </EventTypes>\r\n         </TagEvent>\r\n         <TagEvent>\r\n            <Tag>\r\n               <TagId>2</TagId>\r\n               <BatteryStrength>LowBatteryStrength</BatteryStrength>\r\n            </Tag>\r\n            <Zone>\r\n               <ZoneId>13</ZoneId>\r\n            </Zone>\r\n            <EventTypes>\r\n               <EventType>ExitPointAlarm</EventType>\r\n               <EventType>Loiter</EventType>\r\n            </EventTypes>\r\n         </TagEvent>\r\n      </TagEvents>\r\n      <ZoneEvents>\r\n         <ZoneEvent>\r\n            <Zone>\r\n               <ZoneId>4</ZoneId>\r\n            </Zone>\r\n            <EventTypes>\r\n               <EventType>DoorAjar</EventType>\r\n         </EventTypes>\r\n         </ZoneEvent>\r\n         <ZoneEvent>\r\n            <Zone>\r\n               <ZoneId>13</ZoneId>\r\n            </Zone>\r\n            <EventTypes>\r\n               <EventType>DoorAjar</EventType>\r\n         </EventTypes>\r\n         </ZoneEvent>\r\n      </ZoneEvents>\r\n   </SecurityEvents>\r\n</AccutechMessaging>\r\n\r\n";
                    //******************fails in the middle of the next call****************************
                    _clientSocket.Send(Encoding.UTF8.GetBytes(message));
                    Debug.Print("Completed Socket Send: " + DateTime.Now.ToString("%h:mm:ss.fff"));
                }  
            }
            catch (Exception ex) // this never gets called even in failure
            {
                Debug.Print(ex.Message);
                throw;
            }
        }
        #endregion

        #region Socket Methods
        private void ListenerLoop()
        {
            while (_clientSocket == null) // exit method after socket is found
            {
                    Debug.Print("listening...");
                    _clientSocket = _hostSocket.Accept();
                    Debug.Print("Accepted Port 60504 connection from " + _clientSocket.RemoteEndPoint); 
                    _shortTimer.Start();
            }
        }
        #endregion

    }
}

To recap, I am running on a Cerberus running 4.2 I am getting an IP from DHCP and opening a port at 60504 I am writing the XML String above to that port every second. After @ 2 hours and 48 minutes the debug window said calling socket send and the XML was truncated like this

[quote]

<?xml version="1.0" encoding="utf-8"?> 12 LowBatteryStrength 4 ExitPointAlarm Loiter 2 LowBatteryStrength 13 ExitPointAlarm Loiter </EventTyp[/quote]

What is it about my socket that I am doing wrong, or what can I do to get around whatever the flaw is?

Tal

I am going to throw an Idea out there just to see if it helps spark any thoughts.

I was thinking that I cannot be the only one to experience this without posting to the forums and commenting. However, I see most implementations and samples using the Socket are for Http and for making calls to other systems…relatively short term and often open then close connection.

I am making a socket with a client that is supposed to be open…“forever”.

If there is a problem with memory or something else…it would show up in my “forever” scenario, but never would happen in the call and response mode.

Thoughts?

Hi Tal,
You mention that your socket is supposed to be open ‘forever’. I always consider physical interfaces and transports (e.g. TCP connection over Ethernet) to be unreliable connections. That is, always check before you use a socket that it is still connected, handle the error conditions when you are disconnected mid-send, allow re-connection, etc. I understand this is just prototype code, but the reason most people open-send-close is to make the communications transactional, and to ensure errors only last one transaction.

Also, as a rule there shouldn’t be blocking code in ProgramStarted. Can you move your DHCP connection into the listener thread?

I know this doesn’t help your core problem :slight_smile:

@ byron Actually this is not prototype, it is find the problem code. I have all of the socket sniffers etc in place in the original code.
I believe there is a problem with the framework.

This article http://www.dotnetsolutions.co.uk/blog/socket-connect---a-word-of-warning talks about something similar to what I am experiencing it is talking about v3 but it is enough to give pause to think.

anyone care to comment?