USB Host in RLP

I have a situation where I’m trying to produce a constant or sustained data rate out the USB Host port. With the G120 idle I can achieve ~30KB/s which is OK, but I’m having a bit of trouble completely isolating the write function so that the data rate doesn’t slow down or stop while other operations take place. For example, if TCP data flows into a socket while data is going out the USB host port the rate slows way down or stops for a period of time which is not totally unexpected.

To give you an idea the volume and frequency of USB data I am sending: ~50kb of data every ~2 seconds. Ideally the data rate stays above 50KB/s.

Would it be conceivable/advisable to write a USB Host in RLP to send the data? I found some examples from NXP for mass storage that I might be able to use to get started. The thinking is that sending the data out the USB port in RLP would halt .NET so I would have exclusive access to the port and a more reliable data rate.

You may be wondering why the heck it’s important to have a constant rate of data, well the reason is I’m streaming rasterized image data to a thermal printing device and as the data enters the printer’s small buffer it dumps the bitmap data directly to paper. If the data rate slows or stops the printing can become choppy while it waits for more data and quality suffers.

Thanks for any insight or suggestions.


@ andre.m - This is just one of many functions of the overall application and I agree that if this was all I was doing then I’d take a different approach.

It actually works quite well as it is, but I have to jump through some hoops to make sure the G120 is idle before I start sending data. I guess the real question is if there are any fundamental reasons RLP can’t be used to create a USB Host.


Could anyone form GHI comment if they think the USB host operating in RLP would have potential to send data at a faster rate than the managed side?

@ Superpanda - It is certainly possible to get a faster speed through RLP, if a USB driver would even work there. That’s not a scenario directly supported by us and it could interact in unintended ways with our firmware. If you block in RLP long enough, it is possible you can lose data in other areas of your application, like networking. NETMF is not designed for real time needs like this.

Makes me wonder how would this work in 4.4 when real threads are available.

1 Like

What do you mean by “real threads”?

Oh no! You let the secret out!

I would guess he’s referring to the rework of the lwip integration, which MS has said was poorly done in 4.3…

If I had to take a guess at what @ Gus was alluding to, it would probably be that today multi-threading is implemented by the NETMF interpreter which switches contexts only within the interpreter vs., preemptively performing context switches of native code ie. switch contexts regardless of whether the current execution context is running managed or native code.

I don’t know much about the plans for AOT but I guess this would use preemptive multi-threading that would preempt your native RLP code. Of course with AOT the scenarios requiring RLP should be fewer, but I could still see a few use cases that would benefit from RLP.

I would imagine that the AOT code might even run on top of a embedded OS, given the current direction, maybe even a CMSIS compliant embedded OS. But that is all uneducated speculation on my part as I have not even looked at what MS is doing with AOT other than knowing they are looking at using LLVM… So don’t quote me on any of the above :slight_smile:

I can’t talk for @ Gus, but that would be my guess.

1 Like

NETMF threads are currently preemptive, which means only that a thread does not need to explicitly yield the CPU to other threads. The NETMF interpreter schedules threads, and while threads may not leave any CPU time free, it’s not possible for one thread to prevent other threads from running.

I’m pretty confident that Llilum doesn’t run on top of an OS. It runs on bare metal, but you don’t need an OS to implement threading, even at a lower level than the interpreter does.

While a NETMF thread is executing native code like an RLP function that thread will be preventing other threads from running because the thread is not going back to the interpreter.

Embedded OSes are relatively simple (IMHO), to the point that I would imagine any underlying framework they put in place to manage threads, synchronization (semaphores, mutex, critical sections), timers, events etc. would look and act like any one of the multitude of the embedded OSes for the Cortex-M scale MCUs. The question of whether that would be CMSIS compliant or would it just be a MS special is just a curiosity for me with pros and cons either way.

All speculation on my part of course, I have zero insight into what they are really doing.

@ taylorza - It’d be up to GHI to make the NETMF interpreter preempt your RLP threads. My guess is that it’d make RLP much more complex, but I’ve never implemented threading on ARM.

@ godefroi - Premption of RLP is just not practical; in fact it defeats the purpose.

I am not promising anything but one of the goals is to give the user threading in rlp. The rlp thread and netmf can run happily together, with users setting priorities. We should get soft real time results. But this is way down the road…with video playback and more… :dance:

If we assume that Microsoft is working on making NETMF compatible with UWP then shouldn’t we assume also that Microsoft is working that problem? (these are my assumptions…)

The current NETMF interpreter is single threaded and provides managed threads using the completion queue implemented in the PAL, so a new thread is only scheduled when the interpreter gets time to service the completion queue will a context switch take place. Which is why even some native drivers block for the duration of a data transfer, blocking other threads.

My guess (Gus you need to share more details so I don’t need to take wild guesses) would be that if GHI supported RLP threads they would either need to yield to the interpreter (cooperative multitasking) or they have an implementation that runs a second “native” thread that preempts the interpreter thread and RLP code. These would potentially have some interesting challenges in my mind, for example, ensure the GC does not run and move any memory buffers your currently executing RLP function is referencing. Today RLP can hold a reference to a managed array for the duration of the function execution because it blocks the interpreter and thereby ensures that no GCs would occur.

Again, just an interesting thought experiment, I have no insights into what any one is really doing.

Has the team given any thought as to how the GC will be handled? Or will you just give the ability to block all threads (disable interrupts) if there is a requirement to process a managed buffer from RLP? Alternatively you could copy the buffers, but that might be interesting to see how that would impact existing use cases.