In-field update bricks our boards to bootloader mode

Hey,

Using the in-field updater causes our boards to be bricked to bootloader mode at random times. This means that using the same firmware and application files the update might succeed four times and brick the board the fifth time on the same board.

We can see in our logs that the IFU verifies both the application and firmware as okay, but after the FlashAndReset handle is called, the board is bricked to bootloader mode and has to be reflashed with firmware using the TinyCLR config tool.

We are flashing both the firmware and application every time we perform an in-field update as recommended.

Our boards are customs board using the SCM20260E. The issues has been observed on different firmware versions also the newest v2.2.2.1000.

image
image

public void LoadChunks(ChunkType type, byte[] chunkData)
{
    if(_updater == null)
    {
        _updater = new InFieldUpdate();
        _updater.ResetChunks();
    }

    if(type == ChunkType.APP)
    {
        _updater.LoadApplicationChunk(chunkData, 0, chunkData.Length);
    }
    else
    {
        _updater.LoadFirmwareChunk(chunkData, 0, chunkData.Length);
    }
}

public void FlashSoftware(byte[] appKey, string applicationVersion)
{
    if(_updater == null)
    {
        Logger.Logger.Error("FirmwareFlasher | InFieldUpdate is not instantiated");
        return;
    }

    //Load key
    _updater.LoadApplicationKey(appKey);

    Logger.Logger.Debug("Verifying application.... ");
    string appVer = _updater.VerifyApplication();
    if (appVer != applicationVersion)
    {
        Logger.Logger.Error($"FirmwareFlasher | Application version is: {appVer} but expected version: {applicationVersion}");
        Logger.Logger.Error("FirmwareFlasher | Firmware update abort!");
        _updater.Dispose();
        return;
    }
    Logger.Logger.Debug($"FirmwareFlasher | Application version: {appVer}");


    Logger.Logger.Debug("FirmwareFlasher | Verify firmware.... ");
    Logger.Logger.Debug("FirmwareFlasher | Firmware version: " + _updater.VerifyFirmware());

    // Flashing
    Thread.Sleep(2000);
    _updater.FlashAndReset();
}

Best regards,
Frederik

Hi, can you send us a simple application that you can reproduce?

We ask because if it happened with simple application, we can fix it, if you cannot reproduce with simple application, you can compare with your current project and tell us what different caused the issue.

Hey,

Can you use an application file (.tca) to boot a controller and connect it to our backend so we can start an upgrade, or do you require a project solution?

Do the VerifyApplication and VerifyFirmware handles only ensure that the chunks loaded are correct but not detecting missing or swapped chunks? Should we implement our own validation steps?

  • What happens if a chunk is missing?
  • What happens if a chunk is loaded multiple times?
  • Are there any issues for the IFU when our CPU load is at 70-80%?
  • Should we load chunks to RAM or use the external flash?
    – Currently, we have been loading to RAM.
  • Do a lot of events and threads affect the IFU stability?

We need to know,

  • How big tca is,
  • What the application config is? external RAM, deployment
  • Where firmware/application loaded from: usb/ sd/ network…
  • Do you use indicator pin? if yes, do you use indicator pin somewhere else?
  • When update failed, before using bootloader to reflash the devices, can you connect pin APP (PB7) to low to see if the firmware is still good. This step will prevent application runs, only allow firmware run, and it will help us to know firmware or application is corrupted.

    Whatever that you can help us to reproduce the issue. We don’t want to guess what user did, try here and say “could not reproduce”, unless we have to.

It will check CRC when done all chunks. Meaning even user changes one byte in firmware / app => verification should be failed.

I think it is normal.

We recommend RAM, but you can try external flash as an investigation.

I think it is not, once FlashAndReset() is called, all interrupts are off, watchdog off, no other threads will be executed. Try on simple project, and you will see. For simple project, to have similar size with your real project, you can add jpg or bin data into resource. You don’t need exactly size, example your project is 2.5MB then the simple project is 2.4 or 2.6 probably OK.

That is why we asked for the simple project or the way you reproduce it so we can look into deeper, but we don’t want whole your project if it is sensitive, you know.

Are you using watchdog?
Watchdog does not turn off when you run the IFU updater. We have learned this the hard way with a bricked module in a different country.

This of course depends on the time your watchdog is set to. Ours is extremely strict at 500ms. It could explain why you’re seeing it only sometimes.

We will check, watchdog need to be off. I think it is a bug if it doesn’t turn off.

Yeah we’re using watchdog at its maximum time that might explain why its so random and inconsistent when the boards are getting bricked by the IFU.

Hey

  • tca file
    image
  • The application config, external RAM. Look at the device configuration picture.
  • Both firmware and application are chunk-loaded through an SSL connection.
  • We don’t use the indicator pin.
  • Typically, the devices brick in the field and are fixed on site. However, I do know that when connecting to TinyCLR config, the devices don’t display their chip name but only COM, which is why I assume the firmware has been corrupted.

As @LucaP pointed out I will have a go disabling the watchdog.

Our update tool that uploads the files over TCP is nowadays doing a check if watchdog is turned on before even starting the update procedure.

Might be worth trying to update without watchdog.