STM32F4 Discovery with Cerberus firmware

dobova · May 20, 2013, 10:20am

Just to clear any doubt the STM32F4 build of the firmware include correct flags for hardware floating point in compilation.



; generated by ARM C/C++ Compiler, 4.1 [Build 894]
; commandline armcc [--cpp --split_sections --dwarf2 --no_debug_macros -c --asm
--interleave -oC:\MicroFrameworkPK_v4_2\BuildOutput\THUMB2FP\MDK4.13\le\FLASH\release\MMB_PLUS\obj\Solutions\MMB_PLUS\TinyCLR\tinyclr.obj
--feedback=C:\MicroFrameworkPK_v4_2\tools\make\Feedback\MMB_PLUS_MDK4.13.feedback -cpu=Cortex-M4.fp
-Otime --inline --no_autoinline --diag_suppress=2874,111,161,550,C3011,66,161,230,1293
-IC:\MicroFrameworkPK_v4_2\DeviceCode -IC:\MicroFrameworkPK_v4_2\DeviceCode\pal\rtip -IC:\MicroFrameworkPK_v4_2\DeviceCode\pal\rtip\rtpcore -IC:\MicroFrameworkPK_v4_2\devicecode\pal\rtip\tinyclr -Ih:\keil\arm\RV31\INC -IC:\MicroFrameworkPK_v4_2\Solutions\MMB_PLUS\TinyCLR -IC:\MicroFrameworkPK_v4_2\DeviceCode\include -IC:\MicroFrameworkPK_v4_2\DeviceCode\Cores\arm -IC:\MicroFrameworkPK_v4_2\Support\Include -
...
...

the flag --cpu=Cortex-M4.fp does produce fp code.

Mincho · May 23, 2013, 7:23am

The display is running successfully now with implementation of parallel port:
http://informatix.miloush.net/microframework/Source/UAM/InformatiX/SPOT/Hardware/ParallelOutputPort.cs

The problem is that the switching of the pins is too slow from the managed code.
I saw there is ParallelPort class in GHI.Premium.Hardware, but I couldn’t find any in GHI.OSHW.Hardware.
Is there any faster way for this (by using libraries and managed code) ?

dobova · May 23, 2013, 7:34am

@ Mincho - I’ve done a native drive for display, and it works fine now. It uses interop features.

godefroi · May 23, 2013, 12:14pm

It would be fairly simple to create an interop-native version of the ParallelPort class.

Mincho · May 27, 2013, 3:08am

Any advices for choosing the compiler?
I saw there are few compilers which can manage with this.
What’s the best free choice?

jay · May 27, 2013, 11:50am

You’ll want to grab GCC ARM embedded 4.6

godefroi · May 27, 2013, 8:57pm

Has anyone tried with the current 4.7 version?

jay · May 28, 2013, 10:19am

I’ve heard 4.7 doesn’t work.

Mincho · May 30, 2013, 3:05am

One performance question:

static OutputPort pinD5 = new OutputPort(Pin.PD5, true);
static Register regDSetLow = new Register(GPIOD.BSRRL);
static Register regDSetHigh = new Register(GPIOD.BSRRH);
static uint pinD5set1 = 1 << 5;

void blabla()
{
   var start = DateTime.Now;
   for (int i = 0; i !=  10000 ; i++)
   {
       pinD5.Write(false); pinD5.Write(true);  //Option 1
       //regDSetLow.SetBits(pinD5set1); regDSetHigh.SetBits(pinD5set1); //Option 2
   }
   var diff=(DateTime.Now - start);
   Debug.Print("Time: "+diff.Seconds+"."+diff.Milliseconds);
}

Option 1 and Option 2 are completed for almost same time.
I thought that registry write is faster than pin write?

Also

for (int i = 0; i !=  10000 ; i++)

is faster than

for (int i = 0; i <  10000 ; i++)

but this I can explain to myself I think.

ianlee74 · May 30, 2013, 9:12am

My assumption here would be that Write() is also writing to a register. So, the code is basically identical.

Equality is easier for hardware to test than < or >. There are simply more clock cycles involved.

jay · May 30, 2013, 9:50am

Because the register write looks up the register address in a managed struct, it is quite slow. Meanwhile, OutputPort.Write is very fast; it calls a native function immediately that looks up the address at the HAL level (written in C++), and sets the pin state. If you’re coming from Arduino, you’re probably used to huge performance differences between register writes and the GPIO API. Not the case here. (Of course if you wrote directly to the register in C, the function would execute faster than the GPIO API).

You’re missing the point though. The real slowdown comes from the managed execution environment looking up all those method calls and chugging through them… If you’re trying to speed up your program with direct register writes and != for loops, you’re going to end up with a nasty-looking program that’s not measurably faster. Use interops! Or RLP! That’s what it’s there for.