Inline assembler

WouterH · April 8, 2011, 2:51am

I want to use inline assembler. Which instruction set should I use? The ARM or the Thumb instruction set?

I already figured that inline assembler goes like this:


asm("<instruction>");

Gus_Issa · April 8, 2011, 7:20am

I think everything we do in RLP is ARM instead of tumb

WouterH · April 12, 2011, 2:02pm

I’ve written a small piece of assembler that updates the screen as fast as possible.

I’m rotating a bitmap on the 320 x 240 screen.

Now I have a question about the instruction execution speed of the LPC24xx.

You can compare the assembly code with this pseude C code:


y = 240;
while (y-- > 0)
{
  x = 320;
  while (x-- > 0)
  {
    // 24 assembly instruction here
  }
}

So to draw one frame, 240 * ((320 * (24 + 3)) + 3) = 2074320 instructions are needed. But I ‘only’ get a throughput of 4 / 5 frames per second. Which would mean the CPU executes ‘only’ at 10 MIPS. Is this true? What is the cause of this bottleneck?

Gus_Issa · April 12, 2011, 2:14pm

You are doing a lot of operations on SDRAM and SDRAM is not as fast as the processor. You now can see where processor cash can really help.

WouterH · April 14, 2011, 12:16pm

This is the result of my inline assembler test in RLP on the FEZ Cobra

Gus_Issa · April 14, 2011, 12:24pm

WOW…this is fast considering no hardware acceleration and ARM7 on QVGA screen. Very nice work.

User_12 · April 14, 2011, 1:20pm

very nice

WouterH · April 14, 2011, 2:16pm

It even runs slightly faster now, since I was able to remove another 3 asm instructions by combining them. ARM instruction set is really neat when it comes to instruction combining.

First managed code sends the 64x64 GHI image to RLP.

Next, managed code calculates an optimised (full circle contains 256 degrees) sin / cos lookup table.

Last, managed code calls the Animate function with a sin / cos for the current angle and increments the angle.

This is how the Animate functions looks like (this method draws a single frame ASAP )

Gus_Issa · April 14, 2011, 2:35pm

Nice!

One thing I wanted to try was to move the old MPEG2 video playback RLP on EMX and see how fast can EMX do video. Are you up for this challenge? I am sure I can convince GHI to get you some new toys

WouterH · April 14, 2011, 3:16pm

Hmm, I don’t think the cobra can handle mpeg decoding. As you said memory access is really a bottleneck. That’s why I used 8 CPU registers to cache everything. And reduced inner loop to 14 asm instructions. No C optimiser will ever compact it that much.

Plus doing mpeg decoding in asm will be a pain in the *** 8)

Gus_Issa · April 14, 2011, 3:41pm

I meant take the same code and run on EMX not rewrite it in assembly

EMX can do MPEG but how fast can it doe it At 320x240 ChipworkX can do over 60fps so assuming EMX is 6 times slower then you get 10fps, which is okay. Or make the video smaller and get 15fps which is enough.

WouterH · April 15, 2011, 4:20am

If I have some spare time I’ll look into it

Just captured another video of the rotating bitmap. Now with a 128x128 bitmap and some random movement.

Skewworks · April 17, 2011, 4:12pm

This belongs on code.tinyclr.com so we can all enjoy the awesomess.

cough hint cough 8)

Gus_Issa · April 17, 2011, 4:20pm

pretty amazing for sure

Architect · April 18, 2011, 9:39am

Looks very smooth!

Thank you

WouterH · April 18, 2011, 2:07pm

Here you go:
http://code.tinyclr.com/project/295/rlp-bitmap-rotation-demo/

bg_blea · June 16, 2011, 10:02pm

Cool stuff.
Mind sharing what IDE you use for native and where did you get the info on syntax?
I used PICs for a while and I’m used to have all in one place: doc sheet + instruction set + free ide with good tutorials. (I’m spoiled I know). With ARM all seem scattered and tons of 3rd party IDEs.
Maybe it’s just me :-[

WouterH · June 17, 2011, 1:57am

Well I just use “Programmers Notepad” for editing and a console window for compiling. Nothing fancy.

NOTE: all links below are for the FEZ Cobra, I don’t know if they match other boards.

I use the “Sample Code Bundle” from here:
http://ics.nxp.com/support/documents/microcontrollers/?scope=LPC2468

On the same page you’ll find the “LPC24xx User Manual (UM10237)”

Then you have the instruction set:

More information on inline assembler in GCC:
http://www.ethernut.de/en/documents/arm-inline-asm.html

http://www.devrs.com/gba/files/asmc.txt

You can always let the compiler generate an assembly list file from your C code. There you will learn a lot about how the compiler does things in assembly.

During development I have found that memory access is the bottleneck. The ARM has some internal registers free to use, so if you see that the C code generates too much memory lookups and you want to speed things up, then you can cache stuff in those registers for faster access.