G400 hangs due to possible bug in CAN

After two days of headaches finding why my new G400 board silently hangs after running for a while, I may have found something.

Preamble:

  1. This is a custom G400 board.
  2. Externally powered.
  3. There are ~1000 CAN mesages received per second.
  4. Posting ~100 CAN messages in rapid succesion hangs G400 (no exceptions, no errors, it simply freezes).

This is a snippet that gives me 100% failure within the first minute (debugger attached):

using System.Threading;
using GHI.Premium.Hardware;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;

namespace G400CanTester {
    public class Program {
        private static Thread _blinkingThread;
        private static CAN _can;
        private static CAN.Message[] _msgList;

        public static void Main() {
            //A blinker indicates if the MCU is hanging or not
            var led = new OutputPort( GHI.Hardware.G400.Pin.PD18, false);
            _blinkingThread = new Thread(() => {
                while (true) {
                  
                        led.Write(false);
                        Thread.Sleep(100);
                        led.Write(true);
                        Thread.Sleep(100);
                     }
            });
            _blinkingThread.Priority = ThreadPriority.Lowest;
            _blinkingThread.Start();

            //Initializing CAN'n'stuff
            _msgList=new CAN.Message[1000];
            for (int i = 0; i < 1000; i++) {
                _msgList[i]=new CAN.Message();
            }
            
            var brp = 6;
            var sjw = 1;
            var propag = 1;
            var phase1 = 7;
            var phase2 = 7;
            _can=new CAN(CAN.Channel.Channel_1, (uint)((brp << 16) + (sjw << 12) + (propag << 8) + (phase1 << 4) + (phase2 << 0)),1000); //1Mbit

            _can.DataReceivedEvent += CanDataReceivedEventHandler;
        
            //Creating an array for sending
            var sendList = new CAN.Message[100];
            for (int i = 0; i < 100; i++) {
                sendList[i] = new CAN.Message();
                sendList[i].ArbID = (uint) i;
                sendList[i].Data[0] = (byte) i;
            }

            //Send CAN messages and crash G400!
            while (true) {
                for (int i = 0; i < 100; i++) {
                    _can.PostMessages(sendList, i, 1);
                }
                Thread.Sleep(50); //<-smaller value gives higher probability
            }
        }
        private static void CanDataReceivedEventHandler(CAN sender, CANDataReceivedEventArgs args) {
           int count = sender.GetMessages(_msgList, 0, 1000);
            Debug.Print("CAN: "+count+" received;");
         }
       
    }
}

Important notes:

  1. High traffic only makes G400 fail faster. Eventually it fails with only 130 incoming messages per second and 1 outgoing message per second, but one has to wait half of the day.
  2. Problem is somewhere in PostMessages function; if nothing is sent, G400 does not freeze even with higher traffic.

Guys at GHI, please take look at this, my entire career now depends on this bug :frowning:

easier to tell us what baudrate you are using then we can test it

1 Mbit.

Any news? Can you reproduce the crash?

Next SDK is due shortly so the fix will come soon. We are working on it.

Ok, lets hope “soon” will be really soon…

@ Simon from Vilnius -

I still can not reproduce that bug.

case 1: I used 2 G400, one send - one receive at 1Mb over socket 7, it is running well more than 3000 messages and still running

case 2: used 1 G400 and send-receive together over socket 6 and 7, still running well

Case 3: used 1 G400 and LAWICEL CANUSB (run o PC), at 1Mb, it works well,

Case 1 and 2, it runs automatically, just sometime I add USB debug (MFDeploy) to make sure it is still running and monitor by an LED, then disconnect USB. I mean I am using external power.

Case 3, because this software is running on PC so I have to click on the mouse more than 1000 times by my hand. :))

I used your config to setup 1Mb, sleep 50ms for every PostMessage (as your code)

Let me know if you have another suggestion to reproduce this bug.

Edit: Now I change to sleep 1ms (to send 1000 messages/per second)

@ Simon from Vilnius -

Here is code for 2 g400 send and receive ~1000 messages per second at 1Mb. Let me know if you want me to modify something to reproduce that but.


using System;
using System.Threading;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
using GHI.Premium.Hardware;

namespace G400_TestCAN
{
    public class Program
    {
     
        static InputPort button = new InputPort((Cpu.Pin)(4), false, Port.ResistorMode.PullUp);
        static OutputPort led = new OutputPort((Cpu.Pin)(3 * 32 + 3), true); // led pd3       
        static CAN.Message[] msgList;
        public static void Main()
        {
          
            // Create a message list of 1000 messages
            msgList = new CAN.Message[1000];
            for (int i = 0; i < msgList.Length; i++)
                msgList[i] = new CAN.Message();
            

            var brp = 6;
            var sjw = 1;
            var propag = 1;
            var phase1 = 7;
            var phase2 = 7;
            CAN can = new CAN(CAN.Channel.Channel_1, (uint)((brp << 16) + (sjw << 12) + (propag << 8) + (phase1 << 4) + (phase2 << 0)), 1000); //1Mbit
            // Subscribe to events
            can.DataReceivedEvent += new CANDataReceivedEventHandler(can_DataReceivedEvent);
            can.ErrorReceivedEvent += new CANErrorReceivedEventHandler(can_ErrorReceivedEvent);            
            int numberOfMessagesPosted;
            // G400 can receive message from now, without pressing button
            while (button.Read())  
            {
                led.Write(true);
                Thread.Sleep(500);
                led.Write(false);
                Thread.Sleep(500);
            }
            Debug.Print("Start PostMessage");
            int cnt = 0;
            while (true)
            {
                Thread.Sleep(1);                
                // Send one message
                msgList[cnt].ArbID = (uint)(1 + cnt);
                // Send the 8 bytes for example
                msgList[cnt].Data[0] = 1;
                msgList[cnt].Data[1] = 1;
                msgList[cnt].Data[2] = 1;
                msgList[cnt].Data[3] = 1;
                msgList[cnt].Data[4] = 1;
                msgList[cnt].Data[5] = 1;
                msgList[cnt].Data[6] = 1;
                msgList[cnt].Data[7] = 1;

                msgList[cnt].DLC = 8;
                msgList[cnt].IsEID = false;
                msgList[cnt].IsRTR = false;
                numberOfMessagesPosted = can.PostMessages(msgList, cnt, 1);
                cnt++;
                
                if (cnt == 999) 
                    cnt = 0;
            }
            // Sleep forever
            //Thread.Sleep(Timeout.Infinite);
        }
        static int countReceive = 0;
        static Boolean error = false;
        static void can_DataReceivedEvent(CAN sender, CANDataReceivedEventArgs args)
        {
            countReceive++;
            // Blink LED every 50ms 
            led.Write((countReceive % 100 >= 50) && (error== false));          
            int count = sender.GetMessages(msgList, 0, msgList.Length);
            // Output text every one 1sec 
            if (error == false && countReceive % 1000 == 0)
            {
                Debug.Print("Still running " + countReceive);
            }
        }

        static void can_ErrorReceivedEvent(CAN sender, CANErrorReceivedEventArgs args)
        {
            error = true;
        }

    }
}


Ok, thanks. Will try that tommorow.

Surely you can make something to click the button for you, oh you know, a USB Mouse app on a device?

@ Brett -

it just take few mins to finish that but next time, I will :))

@ Dat - I’ve tried your code on one G400HDR and one custom board. And indeed it works!

So, probably, the problem is not in message count itself, but the distribution of the messages. In my system, they arrive in batches, a few hundred in rapid succesion. Maybe this is the problem? To test this assumption, I’ve modified your code; if I replace

 while (true)
            {
                Thread.Sleep(1);                
                // Send one message
                msgList[cnt].ArbID = (uint)(1 + cnt);
                // Send the 8 bytes for example
                msgList[cnt].Data[0] = 1;
                msgList[cnt].Data[1] = 1;
                msgList[cnt].Data[2] = 1;
                msgList[cnt].Data[3] = 1;
                msgList[cnt].Data[4] = 1;
                msgList[cnt].Data[5] = 1;
                msgList[cnt].Data[6] = 1;
                msgList[cnt].Data[7] = 1;
 
                msgList[cnt].DLC = 8;
                msgList[cnt].IsEID = false;
                msgList[cnt].IsRTR = false;
                numberOfMessagesPosted = can.PostMessages(msgList, cnt, 1);
                cnt++;
 
                if (cnt == 999) 
                    cnt = 0;
            }

with

 for (int i = 0; i < 1000; i++) {
                msgList[i].ArbID = 12;
                msgList[i].DLC = 8;
                msgList[i].Data[2] = (byte)(i & 0xFF);
            }

            while (true) {
                for (int i = 0; i < 100; i++) {
                    can.PostMessages(msgList, 0, 1);
                }
                Thread.Sleep(100);
            }

The receiving G400 silently locks up almost immediately I click the button! Could you try this?

So, to sum things up:

  1. G400HDR (your code)+Custom board (your code) —> both boards work;
  2. G400HDR (your code)+custom board(modified code) —>G400HDR locks up almost the same moment Custom boards starts sending messages.

@ Simon from Vilnius -

see it and fixed it

Thanks because help us to reproduce

Sweet! Will the fix make it into the next SDK?

Yeb, full CAN feature is implemented too

Cool! Thanks a lot. I hope new SDK is out tommorow :slight_smile:

It would have been yesterday but you found a bug and other found few issues. We hope early next week but we are not in a hurry. We need to make sure it is all good.

alright, early next week is less cool, but may still save my head :slight_smile:

Alright, new SDK, so how does the CAN work?

Better, but still not good :frowning: I still experience random silent crashes of my production application :frowning:

At least firmware is not corrupted, resetting the board helps…

It passed all or tests. Anything you can show us to see this problem?