Help, my BINARY_BLOB_HEAD is exploding!

JohnH · November 3, 2011, 7:47am

Can anyone give me some insight into what the “Type 1E (BINARY_BLOB_HEAD)” line in the GC report is about? I have just tried integrating my first serious FEZ Domino application using three software components that all work fine in isolation. Together, RAM consumption grows rapidly until it locks up issuing repeated GC “Failed allocation” reports. I get two successful GC reports before the first failure, and the only difference is a rapidly growing BINARY_BLOB_HEAD and correspondingly declining FREEBLOCK. There is also a CACHEDBLOCK with 228bytes in the first GC and that line is missing altogether in the subsequent reports. The BINARY_BLOB_HEAD goes from 23004bytes in the first report, 40944bytes in the second and then sticks at 41364bytes in the failed reports.

These are the three software components:

A messaging interface using CDC on the client USB to exchange short messages (up to 12bytes) with a PC. The debug interface has been reassigned to COM1. The NETMF driver spawns a worker thread to handle reception with a state-machine protocol engine. It handles transmission using a simple packet assembler on the main thread.
A radio transmitter and receiver interface. The receiver hardware drives a PinCapture object and there is also an Interrupt Handler connected to the same input pin. The Interrupt Handler implements a digital noise filter which detects a valid signal and triggers the PinCapture to sample it for analysis. In normal operation, this Interrupt Handler is continually being triggered by radio noise. The transmitter is driven by another output pin connected to an OutputCompare object.
A tri-colour LED driver using three PWM outputs and a Timer for flashing capability.

When tested in isolation 1, 2 and 3 each work fine. Combinations 1 & 2 and 1 &3 both also work, but 2 & 3 and 1 & 2 & 3 both give the problem outlined in the first paragraph. I have found that the flashing capability of 3 makes no difference either way and the problem is caused by the PWM itself. With all the PWM channels at either 0 or 100 the problem does NOT occur. With the radio receiver of 2 disconnected the problem also does NOT occur.

My current theory is that the PWM is generating RFI which is being picked up by the radio receiver and causing it to generate interrupts at a faster rate than they can be handled. Would this account for the BINARY_BLOB_HEAD growth? If so, is there anything I can do to limit this? Of course I’ll need to eliminate the noise to be able to see a signal, but I need my device to handle any random noise sources it might encounter gracefully.

I have realised I do not really understand how NETMF handles interrupts. The discription for Timer interrupts states that they run on a worker thread allocated from a pool. But Pin interrupts do not seem to be the same. Are these queued and run on the main thread? If so, how do they interleave with other code?

:o (Is this a BINARY_BLOB_HEAD?)

Gus_Issa · November 3, 2011, 7:56am

I usually see this in UART where user’s bug is causing data to be left buffered internally.

It is easy to test your theory. Disable PWMs and keep the rest, works better?

JohnH · November 3, 2011, 8:05am

Yes Gus, like I said with the PWM taken out altogether, or set to 0 or 100 (so disabled with the pin either low or high) the problem goes away.

Does NETMF allocate BINARY_BLOB_HEAD when dispatching pin interrupt delegates? If so, is it possible to prevent it from growing this indefinitely? My app might work if I could prevent it responding to interrupts while already handling one.

Gus_Issa · November 3, 2011, 8:10am

Sorry I missed that. Yes interrupts are allocated and processed in order so your theory can be the case. To be 100% sure, disable the interrupt pin and keep the rest running maybe.

JohnH · November 3, 2011, 8:25am

Yea, I think we have got to the bottom of it. I think my digital filter only works because I am using the timestamps on the interrupts so latency in interrupt handling does not matter so much as long as all the interrupts do get handled eventually. Maybe I can add code to switch the interrupts off automatically if the rate of noise transitions is too high.

I’d still like to understand exactly how these interrupt delegates are handled. Can you point me to any documentation on this? Is this implicit multithreading or is execution interleaved with foreground processing in such a way that automatically avoids synchronisation issues (but potentially adds indeterminate latency)?

Gus_Issa · November 3, 2011, 9:04am

Understand exactly means looking at source code. Download the NETMF PK and peek in there.