I have a a very unusual problem for a GHI product, and the G120 SOM. They’re rock-solid. They’re so reliable that we assume they’ll work fine and use the SOM to run our board diagnostics.
Several G120 SOM modules which were installed by a third party on our custom circuit board. They are loading fine, but failing to run. Some modules won’t talk to the USB debugger at all. Some will boot, but deploying the application fails. Some will boot, take the application deployment, and the basic features work fine up until the application starts running, then they either crash without throwing an exception or throw an exception, then crash.
I’ve done very careful checking to make absolutely sure that the loader and firmware are correct on these boards.
So, I’m suspecting the SOMs were damaged in some way during installation. Are there any low-level diagnostics that will progressively test the SOM to failure, so we can see what went wrong? And, if not, is there something GHI can do with one of these SOMs that has been removed from a board?
What’s the exception? Is it possible you could determine from a software perspective what is failing?
Recently I thought an outside company may have been mishandling our GHI products. The true error went unidentified for way too long. The board had external pulldown resistors I didn’t know about. When I measured the input impedance I thought the boards were damaged. I was able to get just enough working pins when I replaced some chips by hand. That was only because when I soldered the chips myself I broke some of the connections with the resistors…
Soldering chips with so many pins by hand is possible, and some people are very good at it I’m sure, but this kind of manual work is likely to be shoddy. It’s just too difficult. So my point is is that soldering something by hand could simply be introducing a differnet “error” that makes it even more difficult to track the original problem.
I hope someone can answer this question for you as well. I contacted STMicro engineers about my problems since they produce the actual MCU. They basically said that in my case a damaged pin or port could look like anything… IE it was too complicated of an issue to address over email. And I only wasted time trying to prove it was the type of error I initially assumed, (and probably aggravated the third party asking them to be more careful) rather than searching for another cause.
I doubt there’s any meaningful “standard failure mode” on this kind of potential mistreatment. It would totally depend on how much heat, how poorly applied, and for how long. I just don’t think people will be able to give you that level of information. You may find some people who have had actual experience and who can share with you their experience, but there’s nothing to say that a failure like theirs would be indicative of what you would see.
Think of it like this. Will a car tyre manufacturer tell you what kind of failure to expect when you use a standard street car tyre on a hot racetrack? No, they’re going to say that’s outside spec, don’t do it.
Assuming that there is code that is tested and validated:
My understanding is that when a package like the ARM Cortex-M3 (brain of the G120) is overheated, it can cause internal package damage that is difficult to detect from the exterior. I had a similar problem with another chip and it was caused by the oven reflow curve exposing the package to peak temperature for too long.
If after inspection of the chip shows that there isn’t physical damage to the chip, solder bridging, or melted components, I think it would be safe to assume that after replacing the G120 and having the board work correctly, that the original G120 was damaged from overheating. (This could be caused by an oven or from an iron held too long to the chip. I have had both)
Please correct me if I have said something incorrect.
We always erase the whole memory space (that we control) and reload TinyBooter and the firmware. If there was test firmware, it would have been erased. On the other hand, if the “special sauce” we don’t control was “extra-special” I would think the chip would just refuse to talk to us.
So, I’m asking the question mhardy suggests. TO GHI: (Gus?) is it possible that a G120 module might not have the correct low-level firmware when it ships from GHI? Would the symptom be that the bootloader, firmware, and application will load fine but the application won’t run? Or, is this more likely a problem with installation? We already ruled out a problem with the board because it works when another G120 module is installed.
I’m not Gus nor someone from GHI, but I would think it’s unlikely you’d get a unit in that state if you do a full erase like you’re saying. Is it possible to now put that suspect/failed G120 module on another board and reflash it, to see if there’s a behavioural difference?