STM32F4 Discovery with Cerberus firmware

Just to clear any doubt the STM32F4 build of the firmware include correct flags for hardware floating point in compilation.



; generated by ARM C/C++ Compiler, 4.1 [Build 894]
; commandline armcc [--cpp --split_sections --dwarf2 --no_debug_macros -c --asm
--interleave -oC:\MicroFrameworkPK_v4_2\BuildOutput\THUMB2FP\MDK4.13\le\FLASH\release\MMB_PLUS\obj\Solutions\MMB_PLUS\TinyCLR\tinyclr.obj
--feedback=C:\MicroFrameworkPK_v4_2\tools\make\Feedback\MMB_PLUS_MDK4.13.feedback -cpu=Cortex-M4.fp
-Otime --inline --no_autoinline --diag_suppress=2874,111,161,550,C3011,66,161,230,1293
-IC:\MicroFrameworkPK_v4_2\DeviceCode -IC:\MicroFrameworkPK_v4_2\DeviceCode\pal\rtip -IC:\MicroFrameworkPK_v4_2\DeviceCode\pal\rtip\rtpcore -IC:\MicroFrameworkPK_v4_2\devicecode\pal\rtip\tinyclr -Ih:\keil\arm\RV31\INC -IC:\MicroFrameworkPK_v4_2\Solutions\MMB_PLUS\TinyCLR -IC:\MicroFrameworkPK_v4_2\DeviceCode\include -IC:\MicroFrameworkPK_v4_2\DeviceCode\Cores\arm -IC:\MicroFrameworkPK_v4_2\Support\Include -
...
...

the flag --cpu=Cortex-M4.fp does produce fp code.

The display is running successfully now with implementation of parallel port:
http://informatix.miloush.net/microframework/Source/UAM/InformatiX/SPOT/Hardware/ParallelOutputPort.cs

The problem is that the switching of the pins is too slow from the managed code.
I saw there is ParallelPort class in GHI.Premium.Hardware, but I couldn’t find any in GHI.OSHW.Hardware.
Is there any faster way for this (by using libraries and managed code) ?

@ Mincho - I’ve done a native drive for display, and it works fine now. It uses interop features.

It would be fairly simple to create an interop-native version of the ParallelPort class.

Any advices for choosing the compiler?
I saw there are few compilers which can manage with this.
What’s the best free choice?

You’ll want to grab GCC ARM embedded 4.6

Has anyone tried with the current 4.7 version?

I’ve heard 4.7 doesn’t work.

One performance question:

static OutputPort pinD5 = new OutputPort(Pin.PD5, true);
static Register regDSetLow = new Register(GPIOD.BSRRL);
static Register regDSetHigh = new Register(GPIOD.BSRRH);
static uint pinD5set1 = 1 << 5;

void blabla()
{
   var start = DateTime.Now;
   for (int i = 0; i !=  10000 ; i++)
   {
       pinD5.Write(false); pinD5.Write(true);  //Option 1
       //regDSetLow.SetBits(pinD5set1); regDSetHigh.SetBits(pinD5set1); //Option 2
   }
   var diff=(DateTime.Now - start);
   Debug.Print("Time: "+diff.Seconds+"."+diff.Milliseconds);
}

Option 1 and Option 2 are completed for almost same time.
I thought that registry write is faster than pin write?

Also

for (int i = 0; i !=  10000 ; i++)

is faster than

for (int i = 0; i <  10000 ; i++)

but this I can explain to myself I think.

My assumption here would be that Write() is also writing to a register. So, the code is basically identical.

Equality is easier for hardware to test than < or >. There are simply more clock cycles involved.

Because the register write looks up the register address in a managed struct, it is quite slow. Meanwhile, OutputPort.Write is very fast; it calls a native function immediately that looks up the address at the HAL level (written in C++), and sets the pin state. If you’re coming from Arduino, you’re probably used to huge performance differences between register writes and the GPIO API. Not the case here. (Of course if you wrote directly to the register in C, the function would execute faster than the GPIO API).

You’re missing the point though. The real slowdown comes from the managed execution environment looking up all those method calls and chugging through them… If you’re trying to speed up your program with direct register writes and != for loops, you’re going to end up with a nasty-looking program that’s not measurably faster. Use interops! Or RLP! That’s what it’s there for.