I’ve been working on an issue that has baffled me for more than a month. It is a complicated problem that may be too difficult to explain here on the forum but I’ve run dozens of tests and haven’t been able to see a pattern I can use to fix the issue so maybe someone here can help.
I use a Raptor board in one of my go to sea systems that reboots at random times. I’ve run earlier versions of the software that didn’t appear to have the reboot problem and those version now reboot. I’ve hooked up a scope to the battery bus (28VDC - 36VDC), 12, 5 and 3.3 VDC buses and don’t see any voltage sags just before a reboot (or any voltage sag at all) so I’ve been thinking it is some kind of electrical interference issue. I’ve disconnected all the external hardware from the Raptor and hooked up the external devices one at a time and in various combinations. Depending on the combination of external hardware I can get reboots more or less often but I still get them anywhere from 4 minutes after I start my program up to 30 hours after I start my program. My program restarts from the beginning after a reboot and there is no obvious pattern to how often or when the reboots happen. The output in the debug window is as follows after a reboot.
I’ve been running VS2012 Express and NETMF 4.3 QFE1 Gadgeteer Package 2014 R3. However, I’m updating to VS2013 community and NETMF 4.3 SDK 2015 release 1 pre-release 1 as I type. I don’t use the watchdog in any way I know of.
The hardware setup is kind of complicated. I started off using the USB debug port but now use the serial port. I switched because all the electronics sits inside a sealed pressure housing and I have to run the debug cable (serial or USB) through a waterproof bulkhead connector that is not shielded or impedance controlled and I had issues with USB debugging that don’t seem to happen with serial debugging. I have 3 oceanographic instruments connected to serial ports, an Iridium satellite modem connected to a serial port and an Elmo servo motor controller hooked up the last serial port. I have a GPS and a pressure/temperature/humidity sensor hooked up to I2C. I have a precision multi channel ADC hooked up to SPI. The whole mess runs off Li-ion batteries. I use the D-USB board to generate the 3.3 and 5VDC for the gadgeteer hardware. I used to use a micro SD card and then switched to full size SD card when the micro card was discontinued. Right now I’m using a USB flash drive since there was some evidence the SD card was involved in the reboot issue. I had an external watch battery connected to the RTC VBAT on X5 but I’ve removed that since the processor reset line (NRST) is also on X5 and I was worried about noise being picked up on VBAT and cross talking into NRST. I’ve even tried stiffening up the 10K ohm pullup resistor on NRST with a 470 ohm resistor in parallel on a G-Plug. Based on internal temperature measurements there is no significant general temperature rise inside the pressure housing.
Again, I know this is a complicated problem that will be difficult to address on this forum but any help I can get will be greatly appreciated.
I’ll be happy to post some pictures after I beat this problem down. With the current setup using serial port debugging, I can power up the system and all my Debug.Print statements come across the serial port which I have connected to a terminal emulator (uCon the be specific). I’ve set up uCon to log everything it sees. If I start uCon logging before I power up my system, I’ll see and log everything that shows up on the debug port. I can see the normal stuff that shows up on the debug port when the processor powers up. Then I see everything I expect to see from my program’s Debug.Print statements. I see just what I expect to see from my Debug.Print statements until at some random time, the statements stop and I see what I posted in my earlier post which is just what I see when I power up the system. Here’s a snippet of the uCon log starting just before I see a random reset. The first lines that start with a date and time are what my program outputs when it is running. Then you can see the RomBOOT. Everything from that line until the line “Program started” is what the processor puts out when it resets. The “Program started” line is the first line of my program.
Does your project use any Serial port beside serial port for debug?
When did the issue happen? before you switch from usb debug to serial port or after that?
And finally, give it a try with pre-release, 4.3.7.7?
As I mentioned in the first post, I’m using all 6 of the serial ports on the Raptor. However, they are never all sending data at the same time, rarely are more than 2 devices and the serial debug port sending at the same time. It definitely happened before I switched to serial debug and is still happening.
As far as when it happened, that is a little hard to say. We went through a long round of lab and at-sea testing bringing the system up for the first time and had everything working at the end of last year including several multi-day test with no obvious signs of reboots that I remember. Then I went in and fixed some sketchy mechanical issues and “cleaned up” the cabling. In the process of more rigorous testing since then we either discovered this problem that had been there all along or my relatively minor mechanical/cabling changes caused the problem.
I just updated everything on the go to sea computer to 2015 pre-release 1 and am starting a new test. We’ll see what new and wonderful things happen this time.
@ Gene
I see something similar in a project that also involves oceanography. I have three serial streams plus the debug, in a large program. I have seen this in the current release version and in the current pre-release version. I too use a Raptor, but do not have an internet connection.
Usually before the spontaneous reboot one of the comports stops generating interrupts when data arrives. Several minutes later the system reboots and the program starts executing from start. The interval ranges from 10s of minutes to 10s of hours.
One time the program ran for several hours until it threw an out of menory error. WHen I looked at the debug output and the GC garbage collection everything is stable for the first few hours, then the Type OF(string) went from a few thousand bytes to 387528 bytes when it crashed. The exact same code ran fine for twice as long the next run and I have not seen this error since.
You can follow what I am experiencing at this forum entry: https://www.ghielectronics.com/community/forum/topic?id=19084&page=4#msg189955
@ rockybooth - Thanks for the input. I’ll track your thread and post what I learn on my thread. I’ve spent the last month assuming I screwed something up electrically somehow (flaky cable, interference, power supply issues, etc.) and debugging the problem under that assumption. With your input, I’m wondering if it is some very subtle software or firmware bug. I really hope I/we can figure it out otherwise I have a major problem.
@ rockybooth - The more I read your thread the more it sounds like we may have related issues. However, I don’t see any error message of any kind before a reboot. I’ve seen the reboot with the debugger connected so I would expect to see an out of memory exception but haven’t yet. I also have a try/catch block around most of my I/O operations including the serial port reads. My program runs entirely inside a while loop that is inside a try/catch block so I should be able to see and log any exceptions that get thrown inside the while loop, but in my case I’m not seeing any kind of exception before the reboot. Very mysterious and completely frustrating.
I have a benchtop version of my go to sea system and I see the same reboots on the benchtop. So at this point I’m going to re start my at sea testing program with 8-ish hour long at sea tests and hope the reboot doesn’t happen very often in the first 8 hours or so while I continue running long term tests on the benchtop prototype to see if I can isolate this problem.
Based on other threads on the forum, I’m beginning to believe there is some slight chance the issue is hiding in the software somewhere. Is there any reason to believe that a Raptor with one serial port running at something like a 30% duty cycle at 115200 baud rate and 5 other ports that run only very occassionally at 9600 baud will cause random reboots after many hours?
@ Gene - We don’t have any specific reason to believe it would cause a reboot. The best thing to do at this point is to try and reduce your program down as small as possible with as little external hardware as possible, ideally just something like shorting TX and RX, that still shows the issue for us to take a look at.
@ John - I’m on it. I’m in the process of stripping my program down to something that repeatably shows the reboot problem I’ve been experiencing so I could send it to GHI for you guys to look at. Thanks for volunteering, I really appreciate the support.
I am also experiencing unexplained reboot on a G400 custom board, it can reboot after few minutes or few hours without knowing why, no exception, no message on output windows… weird :-[ :-[
This problem still continues to plague us. I’ve rolled back everything to the last, kind of stable, environment I had: VS2012, .NET Micro 4.3 QFE1 and GHI SDK 2014 R3. At this point the second line of my program is
GHI.Processor.Watchdog.Disable();
I also read the controller Reset Controller Status Register (RSTC_SR) every time my program starts up and write the result to a file on SD card with the following code.
private static Register resetControllerStatus = new Register(0xFFFFFE04);
and
uint resetControllerStatusValue = resetControllerStatus.Value;
In my most recent tests, the RSTC_SR values I’ve seen are
If I’m interpreting the RSTC_SR Value correctly (see the page from the data sheet in the attached image) 66049 indicates a watchdog reset even though I explicitly disabled it. 65537 and 66049 indicate something hit the NRST line. I don’t have anything connected to it or anything connected to X5 on the Raptor which is where NRST is.
Any ideas what might be going on and what I can do to fix it will be greatly appreciated.
If you look at the controller data sheet, the WDT_MR register value indicates that WDRSTEN is 1
• WDRSTEN: Watchdog Reset Enable
0: A Watchdog fault (underflow or error) has no effect on the resets.
1: A Watchdog fault (underflow or error) triggers a Watchdog reset.
and WDRPROC is 0
• WDRPROC: Watchdog Reset Processor
0: If WDRSTEN is 1, a Watchdog fault (underflow or error) activates all resets.
1: If WDRSTEN is 1, a Watchdog fault (underflow or error) activates the processor reset.
and
WDDIS is 0
• WDDIS: Watchdog Disable
0: Enables the Watchdog Timer.
1: Disables the Watchdog Timer.
All of which seems to suggest the Watchdog is enabled. Is this possible?
As far as I remember, you can only set watchdog register once on the SAM9x35 chip, which effectively means you can only disable it, without the possibility to enable it again. With that in mind, I assume GHI.Processor.Watchdog.Disable(); doesn’t actually disable it, because then GHI.Processor.Watchdog.Enable() method would be completely useless.
Maybe try disabling watchdog directly via Register class?..
if (GHI.Processor.Watchdog.Enabled)
{
Debug.Print("GHI says WDT is ENABLED!!!\r\n");
}
else
{
Debug.Print("GHI says WDT is disabled\r\n");
}
GHI.Processor.Watchdog.Enabled is different with the Enable bit in WDT_MR register. This value does only turn to true when Enable(xxx) is called, otherwise it is always false.
3.
Yes, it does. The WDT_MR register will be written directly once GHI.Processor.Watchdog.Disable() is called.
What wasnt shown in Gene’s code post was the GHI.Processor.Watchdog.Disable() call done before accessing the registers and printing the info. What exactly does GHI.Processor.Watchdog.Disable() write to the WDT_MR register?
It seems the WDT on the SAM9x35 is still enabled. Is this different than the GHI.Processor.Watchdog by any chance?