Sort of. Space destined computers are in a redundant array where outputs are compared because radiation can flip bits around. For every operation results are compared and majority wins (it is exceedingly unlikely as you add peers that radiation would cause the same errors in RAM or CPU operation).
What I need isn't that elaborate. I just need to determine "module A is no longer responding or working, Module B takes over." In the world of clustered computing, you have high availability through the cluster. Every X period of time all cluster nodes communicate over a redundant pathed cluster network. If one fails to check in or participate for Y cycles, and there is no path found working to that node, it is evicted. A new node is elected to take over it's role.
Now that's a bit overkill for what I need; I won't be failing over file shares, virtual machines, or the like. Just compute operations. So if microcontroller #x fails to check in, microcontroller #y starts taking over. Microcontroller #x is reset (physically) by the remaining nodes to attempt to bring it back online in a working state.
To achieve same state, all IO's would have to be linked and all MCU's would have to be doing the same exact operations; except "passive" microcontrollers would not be setting IO states, driving DAC output, PWM output, etc. They'd still be doing logic, but the lines of code which set state would be skipped until they are told they are "active".
In this scenario all MCU's are doing the exact same things all the time; but X number of them are passive - not driving any logic high or low, or whatever.
If the active MCU fails (goes in to garbage collection for 15 minutes, or just up and quits), the remaining MCU's all have the correct logic state because they've been doing the same stuff as the primary MCU was doing; just without triggering logic. So when one is elected to be active, it would trigger a subroutine that sets state appropriately on all IO's, DAC's, PWM, and so on, based on what it's current state should be.
Thus if MCU #1 fails and MCU#2 takes over, relays, inputs, ADC, PWM, etc are all set to what they should be immediately, restoring the system to working state.