In-field update bricks our boards to bootloader mode

Hey,

Using the in-field updater causes our boards to be bricked to bootloader mode at random times. This means that using the same firmware and application files the update might succeed four times and brick the board the fifth time on the same board.

We can see in our logs that the IFU verifies both the application and firmware as okay, but after the FlashAndReset handle is called, the board is bricked to bootloader mode and has to be reflashed with firmware using the TinyCLR config tool.

We are flashing both the firmware and application every time we perform an in-field update as recommended.

Our boards are customs board using the SCM20260E. The issues has been observed on different firmware versions also the newest v2.2.2.1000.

image
image

public void LoadChunks(ChunkType type, byte[] chunkData)
{
    if(_updater == null)
    {
        _updater = new InFieldUpdate();
        _updater.ResetChunks();
    }

    if(type == ChunkType.APP)
    {
        _updater.LoadApplicationChunk(chunkData, 0, chunkData.Length);
    }
    else
    {
        _updater.LoadFirmwareChunk(chunkData, 0, chunkData.Length);
    }
}

public void FlashSoftware(byte[] appKey, string applicationVersion)
{
    if(_updater == null)
    {
        Logger.Logger.Error("FirmwareFlasher | InFieldUpdate is not instantiated");
        return;
    }

    //Load key
    _updater.LoadApplicationKey(appKey);

    Logger.Logger.Debug("Verifying application.... ");
    string appVer = _updater.VerifyApplication();
    if (appVer != applicationVersion)
    {
        Logger.Logger.Error($"FirmwareFlasher | Application version is: {appVer} but expected version: {applicationVersion}");
        Logger.Logger.Error("FirmwareFlasher | Firmware update abort!");
        _updater.Dispose();
        return;
    }
    Logger.Logger.Debug($"FirmwareFlasher | Application version: {appVer}");


    Logger.Logger.Debug("FirmwareFlasher | Verify firmware.... ");
    Logger.Logger.Debug("FirmwareFlasher | Firmware version: " + _updater.VerifyFirmware());

    // Flashing
    Thread.Sleep(2000);
    _updater.FlashAndReset();
}

Best regards,
Frederik

Hi, can you send us a simple application that you can reproduce?

We ask because if it happened with simple application, we can fix it, if you cannot reproduce with simple application, you can compare with your current project and tell us what different caused the issue.

Hey,

Can you use an application file (.tca) to boot a controller and connect it to our backend so we can start an upgrade, or do you require a project solution?

Do the VerifyApplication and VerifyFirmware handles only ensure that the chunks loaded are correct but not detecting missing or swapped chunks? Should we implement our own validation steps?

  • What happens if a chunk is missing?
  • What happens if a chunk is loaded multiple times?
  • Are there any issues for the IFU when our CPU load is at 70-80%?
  • Should we load chunks to RAM or use the external flash?
    – Currently, we have been loading to RAM.
  • Do a lot of events and threads affect the IFU stability?

We need to know,

  • How big tca is,
  • What the application config is? external RAM, deployment
  • Where firmware/application loaded from: usb/ sd/ network…
  • Do you use indicator pin? if yes, do you use indicator pin somewhere else?
  • When update failed, before using bootloader to reflash the devices, can you connect pin APP (PB7) to low to see if the firmware is still good. This step will prevent application runs, only allow firmware run, and it will help us to know firmware or application is corrupted.
    …
    Whatever that you can help us to reproduce the issue. We don’t want to guess what user did, try here and say “could not reproduce”, unless we have to.

It will check CRC when done all chunks. Meaning even user changes one byte in firmware / app => verification should be failed.

I think it is normal.

We recommend RAM, but you can try external flash as an investigation.

I think it is not, once FlashAndReset() is called, all interrupts are off, watchdog off, no other threads will be executed. Try on simple project, and you will see. For simple project, to have similar size with your real project, you can add jpg or bin data into resource. You don’t need exactly size, example your project is 2.5MB then the simple project is 2.4 or 2.6 probably OK.

That is why we asked for the simple project or the way you reproduce it so we can look into deeper, but we don’t want whole your project if it is sensitive, you know.

Are you using watchdog?
Watchdog does not turn off when you run the IFU updater. We have learned this the hard way with a bricked module in a different country.

This of course depends on the time your watchdog is set to. Ours is extremely strict at 500ms. It could explain why you’re seeing it only sometimes.

We will check, watchdog need to be off. I think it is a bug if it doesn’t turn off.

Yeah we’re using watchdog at its maximum time that might explain why its so random and inconsistent when the boards are getting bricked by the IFU.

Hey

  • tca file
    image
  • The application config, external RAM. Look at the device configuration picture.
  • Both firmware and application are chunk-loaded through an SSL connection.
  • We don’t use the indicator pin.
  • Typically, the devices brick in the field and are fixed on site. However, I do know that when connecting to TinyCLR config, the devices don’t display their chip name but only COM, which is why I assume the firmware has been corrupted.

As @LucaP pointed out I will have a go disabling the watchdog.

Our update tool that uploads the files over TCP is nowadays doing a check if watchdog is turned on before even starting the update procedure.

Might be worth trying to update without watchdog.

Hi, because we don’t have your test application to reproduce the issue, so we had to guess and made one.

The tca is ~ 500KB and firmware. Loaded from SDcard and put into RAM, with external RAM enabled,
We setup system do IFU every time I press the button. We have done > 20 times so far, and “Could not reproduce”.

This is why we keep asking for the simple application to reproduce the issue.

Below is our application test, you can try to see if you can reproduce. If not then look at your application to see anything else need to be checked. The different here is we load source from SD card, you load the src from network. But this does not matter because they are finally cached into RAM and any one byte different will cause exception. You can open the tca or fimware and modify them to see the error will be thrown.

using GHIElectronics.TinyCLR.Devices.Gpio;
using GHIElectronics.TinyCLR.IO;
using GHIElectronics.TinyCLR.Native;
using GHIElectronics.TinyCLR.Pins;
using GHIElectronics.TinyCLR.Update;
using System;
using System.Collections;
using System.Diagnostics;
using System.IO;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading;

namespace IFU_stop_after_few
{
    internal class Program
    {
        const int LED1 = SC20260.GpioPin.PH11; // PH11
        const int LDR1 = SC20260.GpioPin.PB7; // PB7;

        static GpioPin led1;
        static GpioPin ldr1;

        static void Main()
        {
            var controller = GpioController.GetDefault();

            led1 = controller.OpenPin(LED1);
            ldr1 = controller.OpenPin(LDR1);

            led1.SetDriveMode(GpioPinDriveMode.Output);

            ldr1.SetDriveMode(GpioPinDriveMode.InputPullUp);

            var gpioController = GpioController.GetDefault();


            int cnt = 0;
            while (ldr1.Read() == GpioPinValue.High)
            {

                led1.Write(led1.Read() == GpioPinValue.Low ? GpioPinValue.High : GpioPinValue.Low);

                var dt = DateTime.Now;

                Thread.Sleep(100);

                cnt++;


                if (cnt % 10 == 0)
                {
                    GC.Collect();

                    Debug.WriteLine("I am IFU master: " + Memory.ManagedMemory.FreeBytes / 1024);

                }
            }

         
            DoTestIFU_Memory();

        }

        static void DoTestIFU_Memory()
        {
            var media = GHIElectronics.TinyCLR.Devices.Storage.StorageController.FromName(SC20260.StorageController.SdCard);
            const int SD_CLOCK_ADDRESS_REG = 0x00000004;
            var clock_divider = 2;

            // 1: 24 / 1 = 24
            // 2: 24 / 2 = 12
            // 3: 24 / 3 = 8
            // 4: 24 / 4 = 6
            // 5: 24 / 5 = 5

            // Setting clock need to be before initialize/mount SDcard
            Marshal.WriteInt32((IntPtr)SD_CLOCK_ADDRESS_REG, clock_divider);

            var drive = FileSystem.Mount(media.Hdc);

            DriveInfo driveInfo = new DriveInfo(drive.Name);

            Debug.WriteLine("====This is IFU");

            Debug.WriteLine("Free: " + driveInfo.TotalFreeSpace);
            Debug.WriteLine("TotalSize: " + driveInfo.TotalSize);
            Debug.WriteLine("VolumeLabel:" + driveInfo.VolumeLabel);
            Debug.WriteLine("RootDirectory: " + driveInfo.RootDirectory);
            Debug.WriteLine("DriveFormat: " + driveInfo.DriveFormat);

            var appKey = new byte[] { 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11 };

            var indicatorPin = GpioController.GetDefault().OpenPin(SC20260.GpioPin.PB5);

            InFieldUpdate updater;

            updater = new InFieldUpdate()
            {
                ActivityPin = indicatorPin
            };

            var filestreamApp = new FileStream(@"A:\20_app_internal.tca", FileMode.Open);
            var filestreamFw = new FileStream(@"A:\20_firmware.ghi", FileMode.Open);

            var idxApp = 0;
            var data = new byte[1024];
            var countDebug = 0;

            while (idxApp < filestreamApp.Length)
            {
                if (countDebug % 100 == 0)
                    Debug.WriteLine("Loading application. Idx = 0x" + idxApp.ToString("x8"));
                var count = filestreamApp.Read(data, 0, data.Length);

                idxApp += updater.LoadApplicationChunk(data, 0, count);
                countDebug++;
            }

            updater.LoadApplicationKey(appKey);


            Debug.WriteLine("Verify App.... ");
            var ApplicationVersion = updater.VerifyApplication();

            Debug.WriteLine("App Version: " + ApplicationVersion);

            var idxFirmware = 0;
            while (idxFirmware < filestreamFw.Length)
            {
                if (countDebug % 100 == 0)
                    Debug.WriteLine("Loading Firmware. Idx = 0x" + idxFirmware.ToString("x8"));
                var count = filestreamFw.Read(data, 0, data.Length);

                idxFirmware += updater.LoadFirmwareChunk(data, 0, count);
                countDebug++;
            }

            Debug.WriteLine("VerifyFirmware.... ");
            var firmwareVersion = updater.VerifyFirmware();

            Debug.WriteLine("FW Version: " + firmwareVersion);

            Thread.Sleep(1000);
            Debug.WriteLine("Flashing.... ");
            updater.FlashAndReset();
        }
    }
}

Could there be some race condition in the power circuitry which momentarly drops power on reset?

Try enabling watchdog before starting the update and I think you’ll find the issue.

I think you mean disable?

In order to reproduce the bricking issue, you need to turn on watchdog. If you don’t want to brick it, which is the normal situation, you want to turn off the watchdog.

1 Like

Ah! I understand now…

Alright, we added watchdog 5 seconds and reset every 4 seconds, seem could not reproduce still.

Once watchdog enabled, can’t disable. In native code, we set max 32 seconds if it is enabled.

If tca is less than 640K (here is 540K) and firmware only then IFU should takes ~ 20 seconds.

But we are seeing an issue if tca few MB that takes more than 32 seconds and correcting it.

Yes, we see an issue but seem we are still not on the same page.

Hi, Frederik is at vacation
Our application is 537kB and Firmware is 1064kB
Wa always update both application and firmware

Why not just reset the watchdog counter throughout the loop that does the flashing?

Or reboot the chip so watchdog is off, right before the flashing?

I just received 3 SM20260D’s that the customer attempted to do an in-field update and they got bricked. All three boot up in Loader mode.

The new tca and firmware load from USB drive into RAM. Both get verified before the reflash process.

The tca file is 3.46MB. Before sending to the customer it worked here at my office. The customer tried updating 4 devices. One was successful, the other 3 failed.

Not sure where to go from here. Any suggestions?

We just fixed the issue we found anyway, hopefully that correct other issue @ SoftcontrolFrederik seeing.

@skeller

Is watchdog enabled?