Multiple Threads Interfering With Interrupts. How to Diagnose and Fix?

I’m running a few threads on my device per this thread:
https://www.ghielectronics.com/community/forum/topic?id=23551

When I rotate the encoder to drive a menu, It should scroll one menu item at a time. On occasion, the menu will jump several items. Here’s a video of the behavior(starting around the 9 second mark you see the jump):
https://www.youtube.com/watch?v=z_Ew_MWtQvY

When I eliminate the sensor threads, this behavior goes away.
How can I diagnose this? Any ideas on what is happening?

@ Gismofx - Could it be garbage collection occurring?

Multiple responses to a single interrupt could be a number of things - you;re going to need to debug it either in the debugger or with Debug.Print, but unfortunately both of those can change the execution profile (timing) of your program.

There’s actually so many ways this could go wrong, it’s hard to give much guidance. The two hardest classes of bugs to diagnose are memory corruption and concurrency bugs (and this is probably a concurrency bug). That’s why threads, for reasons of both runtime cost and maintenance complexity, are best left to those situations where you REALLY need them - not just as a mechanism for simplifying code structure.

You could add a limit so that no more than one move can occur in say 500ms. Then if you get extra events you just toss them out.

I recall your hardware design for this project and you had a capacitor/resistor to act as a sort of debounce. The fact you are seeing the occasional extra click could be down to the fact you are still seeing some bounces. How does the pulses look like on a scope (if you have on that is)

@ Dave McLaughlin -
Signal is clean: Here’s a shot where I spin Clockwise and then Count-Clockwise. The signal looks no different when enable or disable the sensor threads(see atttached image)

Is it possible you’re just not reading interrupts fast enough?

@ mcalsyn -
That’s my fear… I won’t be able to do much with normal debug statements without interfering with the actual timing of the system. I’m still trying find a way to figure out what’s going on.

assuming it is not GC, as I mentioned earlier, can you increase the sleep times in your threads, one at a time, to see if you can isolate the thread causing the delay?

I would think that I should be capturing all the interrupts, but I suppose stranger things could occur. How could that happen?

Here’s a link to a post I made about the encoder with the code:
https://www.ghielectronics.com/community/forum/topic?id=20019&page=2#msg218355

@ Mike -

Here’s the code for my two sensor threads:

        private static void RunOdometer() //int waitMS
        {
            uint maxValue = 65535;//highest number before overflow/reset
            uint cStart;
            uint cPrev;
            uint diff;
            double distanceKM;
            cPrev = 0;

            while (true)
            {
                //ticks1 = System.DateTime.Now.Ticks;
                cStart = InternalSpeedSensor.Ticks;
                if (cStart < cPrev)//detected a roll over
                {
                    diff = (maxValue - cPrev) + cStart;
                }
                else
                {
                    diff = cStart - cPrev;
                }
                cPrev = cStart;
                distanceKM = (DeviceSetting.WheelCircumference.Value / DeviceSetting.NumberOfSpeedPickups.Value) * diff * .000001;
                DeviceSetting.Odometer1.IncrementValue(distanceKM);//update odometer
                DeviceSetting.Tripometer.IncrementValue(distanceKM);//update trip
                Thread.Sleep(500);
            }

        }
        
        /// <summary>
        /// 
        /// </summary>
        private static void UpdateOdometerEEprom()
        {
            while (true)
            {
                DeviceSetting.Odometer1.UpdateEEProm(DeviceSetting.Odometer1.Value);
                Thread.Sleep(10000);
            }
        }

What could I modify?

@ Gismofx - how often does the issue appear?

you are showing snippets of code. hard to figure out what. is happening with methods be called in the code you posted.

nothing in the code you posted seems to explain the problem, unless the methods you are calling take a long time to complete.

you still have not addresses a GC issue.is anything appearing in the debug window when the problem appears.

It occurs often enough to be noticeable. I would say about 15% of the time i rotate the encoder.

The IncrementValue Method simply adds the input value of the property to itself in the class. Fast code.
The UpdateEEprom Method probably takes a little longer to execute, but it’s not an very intense operation, I convert a value to bytes, compute a CRC, combine the byte arrays and write it. Then, I read it back to verify and move along. The EEprom SPI runs at 10Mhz. I timed the operation. I get about 5ms to complete the eeprom update.

I don’t see anything coming up about the GC in the debug window while this happens. Is there code I need to implement to debug GC?

Just a suggestion, but instead of using Debug.Print, to diagnose, why don’t you just create a Global String, that you output the time, action, and any other relevant data to each time the interrupts ect are called. Periodically, you could write it all to a text file, or display via debug.print, but that way your only using a minimal amount of time.

Here’s another video which seems to be a little more clear with how I get more events than expected.

If I put a break-point on the event subscriber, I can’t re-produce the result. When I remove the break point, I will get the repeated events.

I’m still at a loss of how to continue to diagnose this. Any other ideas?

It’s hard without being more familiar with your code, but you could put integer counters in various places (at the head of the interrupt handler, and where you move the menu) and then throw an exception if they get out of sync with more moves than interrupts. When the debugger interferes with your code, you need to inject code that seeks to identify the problem and then brings the debugger in when the problem is detected.

One brute-force way to work around the issue might be to grab the tick-count from DateTime and ignore menu actions that occur too close together. Sort of a software debounce mechanism.

Anything more insightful would require being a lot more intimate with your hardware and software setup.