FEZ Cerberus 4.2 GCC Build - Performance

As I mention in another post, working off the excellent work of NicolasG I was able to get a GCC build going on the GHI Cerberus Firmware. Tonight I thought I would run some quick comparisons and here are the results.

I have used the exact same memory layout as GHI is using for their build, the only difference is that the build is using native float support of the STM32 MCU, which makes a huge difference in performance and also why, I think, the GCC build is unexpectedly smaller since the floating point emulation code is not required, but I have not really investigated this.

For the benchmark I used Duke Nukem’s ‘Gadgeteer Performance Test - Calculating Digits of PI’ test, here are the results running the custom build vs. the stock build. You see that as expected the execution time compares favorably with that of the Mountaineer board times in Duke’s tests.

Custom Build

Debug.GC(false) - 61140
Debug.GC(true) - 61164
20 - 00:00:01.8682210
200 - 00:03:30.9038860

Stock Build

Debug.GC(false) - 61116
Debug.GC(true) - 61140
20 - 00:00:03.6439740
200 - 00:06:54.6741030

2 Likes

Excellent! Are you planning on creating a wiki page with instructions on doing the build? Would be nice to have it.

@ Architect - What I really hope for is to have it committed to codeplex repository so that going forward we have a GCC build for the Cerberus straight from the repository. But before requesting that I want to get approval from NicolasG since the work is really 99% his work, I just merged the key changes back and set the scatter file up to match the RVDS scatter file.

If that all happens, then the build it pretty much the same as for the Hydra, but I will provide Cerberus specific steps just so that it is 100% clear.

Codeplex is even better!

@ taylorza - just read description on his codeshare. He mentioned that his work is based on yours (among others) :smiley:

I did see that, but I think he was being modest :slight_smile: He has done far more than what I had done. He did all the GNU asm code conversion from the RVDS asm code etc.

@ taylorza - excellent results :slight_smile:

yeah, converting asm code is not fun.

@ taylorza - have you tested any string manipulation or just the Pi test?
Curious to see what general performance is like now.

@ Justin - I only did the PI test, can run a few string tests, do you have anything specific in mind?

@ taylorza - not really just curious to see if its had any effect either positive or negative on generic string concats etc.

But in saying that most of the stuff that interests me is mor maths and floatie stuff so Very happy with your results…

@ Justin - Here is the results of a quick and dirty test concatenating some strings using the string data type and the StringBuilder. The string builder test includes the time it takes to convert the result to a string.


Custom Build
------------
String: Concat 500 short strings	:00:00:00.2117900
String: Concat 100 long strings		:00:00:00.0417850
StringBuilder: Concat 500 short strings	:00:00:00.2498980
StringBuilder: Concat 100 long strings	:00:00:00.0538070

Stock Build
-----------
String: Concat 500 short strings	:00:00:00.3347730
String: Concat 100 long strings		:00:00:00.0641170
StringBuilder: Concat 500 short strings	:00:00:00.4270290
StringBuilder: Concat 100 long strings	:00:00:00.0910680

@ taylorza - Those are very encouraging results! Top stuff indeed.

Best i get one of those boards for you to play with…

Can’t wait…

So much for the “you really have to have MDK” rumour, I guess.

When does GHI start building with GCC?

The really hard ASM work is from CW2 for the Netduino Go: https://bitbucket.org/CW2/netduinogofirmware
I found it when I tried to translate code like this

STR      r2,[r0,#__cpp(offsetof(SmartPtr_IRQ, m_state))]

to

 asm("STR r2, [r0,%[state]]"::[state]"l"(offsetof(SmartPtr_IRQ, m_state)));

Note that the asm() parameter is not a quoted string!
The ASM code I did is really straightforward compared to this.

The real problem is the GCC version.
Looking at FirstEntry.s, you can see this :

@ The first word has several functions:
@ - It is the entry point of the application
@ - it contains a signature word used to identify application blocks
@ - out of reset it contains the initial stack pointer value
@ - it is the first entry of the initial exception handler table
@ The actual word used is 0x2000E00C

    b         Start         @ 0xE00C
    .hword    0x2000        @ Booter signature is 0x2000E00C
    .word     Start         @ Reset
...

After compilation, the assembly result is available in tinyclr.axfdump (c:\MicroFrameworkPK_v4_2\BuildOutput\THUMB2\GCC4.6\le\FLASH\release\FEZCerberus\bin)
Here is the correct Booter signature:

Disassembly of section ER_FLASH:

08010000 <EntryPoint>:
 8010000:	e00c      	b.n	801001c <Start>
 8010002:	2000      	.short	0x2000
 8010004:	...

and now the result for GCC 4.7

Disassembly of section ER_FLASH:

08010000 <EntryPoint>:
 8010000:	f000 b80d 	b.w	801001e <Start>
 8010004:	001f2000 	.word	0x001f2000
 8010008:	...

As you can see, the thumb directive is not honored in GCC 4.7: the code size is larger and even if the compilation is completed the Booter signature is broken.
I didn’t found why in the GCC documentation.

@ NicolasG - GMod and I had a lot of “fun” doing exact same thing last year:

http://www.tinyclr.com/forum/topic?id=4691&page=12#msg49132

@ Architect - I just picked up the pieces and stuck on it until it work!
And I also removed the

asm("BX       lr");

as it is already implemented by the function return.

I think MDK can do it right, GHI guys should really find the right compilator options.

Yeah, I have figured that one out eventually.