Firmware/Application Corruption

Hi all,
I’m still learning some of the hardware side of this. I have a product that uses the SC20260D SITcore module, and between 4 and 5% of them will fail after a few months due to corrupted firmware or application, necessitating a module replacement. I usually have to start it in loader mode and erase everything, and redeploy the application to recover the module, but I’ve had about 3 or 4 that are completely bricked and do not respond at all, at least to the config tool.

Where do I even start? Is this just the nature of the beast, or are there poor electrical design choices that can affect the deployment region of the flash? I have only ever seen issues when the unit is powered off during a read or write, but that never happens to this memory, if I understand it properly. This failure rate just seems high, so I want to dig into my board design to make sure that I’m not causing any problems, but I don’t even really know what to look for unless I know the mechanism by which this flash can fail.

Do you extend deployment to external flash? Do you write to external flash at run time?

I do extend deployment to external flash. There is really no way around that as the program is too large.

What firmware installed please? This is a slightly older version, waiting to fully vet out our production application which uses 2.2.

That is fine but do you write to flash at runtime? Like to store data?

Yes I do - I’m going through SecureStorage. Is this physically the same flash that is used for deployment?

Secure storage uses internal flash memory, not external. It is same flash used for firmware and for deployment. It is an independent sector and should not cause firmware corruption.

Do you regularly write to flash at runtime, like to secure storage? Is it an option to try to run without writing to flash to see if the problem goes away?

Maybe with more info we can try to find a reason. You can also send us a device that failed for analysis especially those that don’t come back to life. This should never ever happen!

There are few thing you can do in your side first, because we need a bit more clear about firmware or application corrupted.

Recommend TeraTerm, enter bootloader mode, type “R” and Enter,

  • if Bootloader say firmware corrupted => firmware corrupted. Try to re-deploy firmware only => work again?
  • if Bootloader doesn’t say anything but application doesn’t start, mean application is corrupted.

If application corrupted, please extract the application with key 00-00…

Use the encrypted file with key 00-00… and redeploy again to see if it work.

There are new feature latest TinyCLR Extension that allow you to extract to binary. Compare this file to the working one and tell us the result, if you have time.

Every few minutes it checks if a configuration setting in the mirror (stored in volatile RAM at runtime) has been changed, then backs it up to the secure storage. This is the only writing to flash that happens at runtime. I’ve had issues with bad secure storage sectors in the past, but that was due to us overworking it, and it only corrupted a single sector at a time. Unfortunately, this is such an uncommon problem it would be nearly impossible to run this test in house with the resources available to me and the feature is required for production.

I will dig through my pile and find a bricked one to send to you - can you message me with a good shipping address?
We also have a fried unit coming back in the next few days - I will run the tests that @Dat_Tran mentioned and if they are inconclusive I can forward that module to you as well.

The 2 boards were received:

First board: This is Rev A engineering sample. Probably insider sample/ You should not be using this board in production and it will surly not work properly with production software.

Second board: The micro on the board is alive but its config/security bits were modified. 2 things can cause this, an unauthorized use or static discharge. Dues to firewalls, we are prevented from digging further in to get more info on what happened exactly.

Hey Gus,
Thanks for taking a look at that. Can you confirm the Rev A board is the one with “dead” label on the back? i can’t remember which i had sent you.

I also confirmed that we have no Rev A or Rev B modules in our production - this is from my pile of ones that have failed while I was messing with them.