ILI9325 TFT LCD Display library

Architect · October 21, 2010, 2:20am

This is my initial version:

(link removed)
(link removed)

Solution file has targets for FEZ and Netduino so it works on both (tested).
Unfortunately it is much slower than equivalent code on Arduino. I will be testing more and trying other interfaces that ILI9325 provides.

On FEZ-Panda I am using OutputPort and low level Register class to access ports directly, so it is a little bit better than Netduino (Netduino doesn’t have Register class).

If you have any ideas on improving the performance (speed) of the code, please share.

Thanks,
Valentin

MarkH · October 21, 2010, 4:38am

Your FEZ1/FEZ2 arrays are very large, which will slow things up. Consider storing it as some sort of binary format which you can then open and run through as a stream.

A few quick thing’s i’ve noticed:

Consider changing:

for (int i = 0; i < 8; i++)

to

for (int i = 0; i < 8; i = i + 1)

In FillRect:

for (i = 0; i < (ey - sy + 1); i++) 
                for (j = 0; j < (ex - sx + 1); j++)

could be optimised to


int a = (ey - sy + 1);
int b = (ex - sx + 1);

for (i = 0; i < a; i++) 
                for (j = 0; j < b; j++)

This saves doing a subtract and addition every iteration.

in DrawImage:

for (int j = 0; j < y; j++) 
                for (int i = 0; i < x; i++) 
                { 
                    WriteRegisterData((byte)(pic[k] >> 8), (byte)(pic[k])); 
                    k++; 
                }

can optimise to:

for (int j = 0; j < y; j = j + 1) 
                for (int i = 0; i < x; i = i + 1) 
                { 
                    ushort b = pic[k];
                    WriteRegisterData((byte)(b >> 8), (byte)b); 
                    k = k + 1; 
                }

If you can, convert your code to use 32 bit values rather than 16 and 8 bit. Converting 8/16 bits up to 32 bit for running operations on them takes time.

For simple code cleanlyness:

//OutputPort method 
        private void TransferBytes(byte VH, byte VL) 
        { 
            byte data; 
            data = VH;

should be

//OutputPort method 
        private void TransferBytes(byte VH, byte VL) 
        { 
            byte data = VH;

Also consider changing variable names to be more verbose, so their usage is easier to understand. VH/VL mean nothing to me looking at this… 2 letter variables are bad. Code readability with C# is king, there is no need for short variables that are not ultra temporary (like i/j/k in loops for instance - those are very limited scope).

Architect · October 21, 2010, 5:22am

Hello Mark,

Thank you for your feedback. I see your points, although i=i+1 vs i++ is questionable. In my defense - this is the initial version to make LCD work with Panda and the code is going to change.

I did “for loop” optimizations that you have suggested and run my test.

Before:
Fill full screen 240X320 in: 00:01:45.2346245
Draw 50x67 image in: 00:00:04.6795823
Draw 50x50 image in: 00:00:03.4967876

After:
Fill full screen 240X320 in: 00:01:44.5693713
Draw 50x67 image in: 00:00:04.6729302
Draw 50x50 image in: 00:00:03.4920995

The main issue is speed of GPIO. I believe this is the main bottleneck. As you probably saw, I have two different methods to transfer data to the display. One is using 8 OutputPort instances for each data pin, other through port Register manipulations. The difference in speed is significant 5sec vs 3sec . I am going to switch RS, CS, WR and REST manipulations using low level Register as well to check the impact. I bet it will improve the speed.

My next idea is to write native helper “dll” and use PInvoke.

Bec_a_Fuel · October 21, 2010, 6:02am

I also don’t understand i = i + 1 vs i++, but you can still improve the loop speed by doing this :


for (int i = 0; i < 8;) 
{ 
  // Your code
  i += 1; 
}

MarkH · October 21, 2010, 6:16am

Surprisingly, in my testing at least, i = i+1 is faster than i++ and MUCH faster than ++i (which isn’t so surprising);

Assignment operators are slower than doing it yourself… so:
i += 1 is slower than i = i + 1;
i /= 2 is slower than i = i / 2;

I must admit i expected more of a speedup than you got. received considerable performance increases (although saving a second on the full screen wipe is quite good) i’m quite disappointed with the drawing of the images. Perhaps you could read the arrays out into a memory stream (and time that) then seek the memory stream back to 0 - then read your arrays out of the memory stream for writing to the LCD. Savitch and i have both found that considerable performance increase can be gained from not having to use large arrays.

Brett · October 21, 2010, 6:40am

I believe Mark helped out here:
(link removed)

a finding was:

So even chunking up what you have might help, or as Mark says look at streams

MarkH · October 21, 2010, 6:46am

I havent actually tested streams, i’m just guessing they will be faster from my knowledge of full .net.

I’ve been working with Savitch to develop a PCB and to get the code sub 10ms. I’ve also been writing my own versions of a few of the native libraries - and they are coming up faster in highly optimised managed code. I’ve spent the past week and a bit optimising .netmf code and working out what works and what really doesn’t.

It would be interesting to see just how much time is taken up by various methods to see where your slow downs are

Bec_a_Fuel · October 21, 2010, 7:39am

Another way of improving loop speed is to have less loops… :-[

So, with this in mind, what about duplicating the “working code” inside the loop ? You would then spend twice as less time checking for the loop’s exit value :

Consider the following code:


Start = DateTime.Now;
for (int i = 0; i < 1000; i += 1) { j = 1; }
End = DateTime.Now;
Debug.Print((End - Start).ToString());

Start = DateTime.Now;
for (int i = 0; i < 1000; i += 2) { j = 1; j = 1; }
End = DateTime.Now;
Debug.Print((End - Start).ToString());

Here, “working code” is only an assignment to J. The only difference in those code is the duplication of this assignment and obviously the increment by 2 for the loop variable.

Here are the resulting timings on my PC :

00:00:00.0365197
00:00:00.0207906

Not bad, isn’t it ? More than 40% improvement.

The main drawback is that it is not beautiful code… But if performance is the goal, I think it’s worth trying.

Bec_a_Fuel · October 21, 2010, 9:20am

I got a little reproductible improvement by using a do-while loop :

int k = 0;
Start = DateTime.Now;
do { j = 1; j = 1; } while ((k += 2) < 1000);
End = DateTime.Now;
Debug.Print((End - Start).ToString());

This result is the third one, the 2 others being those seen before :

00:00:00.0365191
00:00:00.0207901
00:00:00.0204170

MarkH · October 21, 2010, 9:29am

Bec, as a note for checking reproducible results - you need to run that loop code several hundred/thousand times until the time is into the dozens or hundreds of miliseconds. That way you know that it’s the code which has sped up and not just some other process getting in there. It also shows you how much time is spent on the garbage collection for that loop

Bec_a_Fuel · October 21, 2010, 9:38am

Yes, you’re right. The term “reproductible” here is not the good one.

What I was meaning it that on almost 30-40 times (roughly) I always see a similar difference in timings. This difference is never exactly the same (how could it be with .NetMF, btw) but nearly stable across sessions. This shows that there is an improvement.
I admit that the last one is very little, I just mentionned it because I tried but I don’t think it’s worth using that. Though it may still depend on what is done inside the loop…

What is more interesting, though, is the fact that duplicating the code inside the loop does indeed speed up things. No need of hundreds of tests loops to see that

Also, it should be tested with real code, not only simple variable assignement. But I’m pretty sure that there will be a big improvement in speed, even with GC or real code. Only because of less bounds checking.
But I can’t checkt that by myself as I don’t have the device. So, if Architect could do this, I would like to know how it compares with his actual code.

Architect · October 21, 2010, 10:23am

Hi,

I have “unrolled” inner loop and changed index incrementation as suggested:

New code:


	public void FillRect(int sx, int sy, int ex, int ey, int col)
	{
            LCD_CS.Write(false);
            int nPixels = (ex - sx + 1) * (ey - sy + 1);
			SetAddress(sx, sy, ex, ey);


            for (int i = 0; i < nPixels; )
            {
                WriteRegisterData((byte)((col >> 8)), (byte)(col));
                i = i + 1;
            }

            LCD_CS.Write(true);
	}

        public void DrawImage(int x1, int y1, int x2, int y2, ushort[] image)
	{
            LCD_CS.Write(false);

            SetAddress(x1, y1, x2, y2);
            int nPixels = (y2 - y1 + 1) * (x2 - x1 + 1);
            ushort b;

            for (int i = 0; i < nPixels;)
            {
                b=image[i];
                WriteRegisterData((byte)(b >> 8), (byte)b);
                i = i + 1;
            }

            LCD_CS.Write(true);
	}

Here is the results:


Old code:

Fill full screen 240X320 in: 00:01:44.5693713
Draw 50x67 image in: 00:00:04.6729302
Draw 50x50 image in: 00:00:03.4920995


New code:

Fill full screen 240X320 in: 00:01:44.7068635
Draw 50x67 image in: 00:00:04.6324443
Draw 50x50 image in: 00:00:03.4617747

Fill full screen 240X320 in: 00:01:44.6982873
Draw 50x67 image in: 00:00:04.6324530
Draw 50x50 image in: 00:00:03.4617273

Mark,
I think it is consistent with FillRect. 50x50 image takes 3.46s, 30 of those ( ~00:01:43.6) will make full screen which will be around FillRect time.
I liked you “Parallel to SPI GLCD conversion” article on wiki. ILI9325 chip supports SPI, but it is not accessible on this LCD.

Bec_a_Fuel · October 21, 2010, 10:27am

Could you please try this code :


public void DrawImage(int x1, int y1, int x2, int y2, ushort[] image)
	{
            LCD_CS.Write(false);
 
            SetAddress(x1, y1, x2, y2);
            int nPixels = (y2 - y1 + 1) * (x2 - x1 + 1);
            ushort b;
 
            for (int i = 0; i < nPixels; i += 2)
            {
                b=image[i];
                WriteRegisterData((byte)(b >> 8), (byte)b);
                b=image[i+1];
                WriteRegisterData((byte)(b >> 8), (byte)b);
            }
 
            LCD_CS.Write(true);
	}

Edit:

@ MarkH : I’ve modified my program so that each “test loop” is executed 1000 times. I then store the results in an array (2 arrays, in fact, one for each loop type) and then I calculate the difference in timings at the end.
This gives :

loop method 1 : 1000 iterations -> 37.0851837 sec
loop method 2 : 1000 iterations -> 21.1456816 sec

This give an average of 43.24% improvement.

MarkH · October 21, 2010, 10:32am

The article is actually savitch’s not mine

Bec, these small couple of milisecond improvements all add up when the code is running hundreds and thousands of times. It’s scary how much time can actually be saved!

The problem with running the code twice in a loop is there is no check to see if the next item exists, so you could end up going out of bounds!

Bec_a_Fuel · October 21, 2010, 10:40am

Mark, I know and understand that this may not be a good practice for general use But, when it’s possible and if timings are important, then it may be worth a try, I think.

Let’s wait for the new timings with this method. This will give a good clue about its (un)usefulness in that particular case

Architect · October 21, 2010, 10:46am

As Mark said, doing 2 pixels at a time without extra check in the loop restrict the images to have even number of pixels.
I am adjusting first image changing it to 50x62 instead of 50X67. This will affect the time for “did you know monkey” image ;D.

Architect · October 21, 2010, 10:47am

Here it is:


Fill full screen 240X320 in: 00:01:44.7067618
Draw 50x62 image in: 00:00:04.2456781
Draw 50x50 image in: 00:00:03.4274317

Bec_a_Fuel · October 21, 2010, 10:58am

May I ask another (last one, I promise) test ? :-[

What about doing the same thing in the TransferByte method ?

private void TransferBytes(byte VH, byte VL)
        {
            byte data;
            data = VH;
 
            for (int i = 0; i < 8; i += 2)
            {
                pins[i].Write((data & 0x01) != 0);
                data = (byte)(data >> 1);
                pins[i+1].Write((data & 0x01) != 0);
                data = (byte)(data >> 1);
            }
 
            LCD_WR.Write(false);
            LCD_WR.Write(true);
 
            data = VL;
            for (int i = 0; i < 8; i += 2)
            {
                pins[i].Write((data & 0x01) != 0);
                data = (byte)(data >> 1);
                pins[i+1].Write((data & 0x01) != 0);
                data = (byte)(data >> 1);
            }
 
            LCD_WR.Write(false);
            LCD_WR.Write(true);
        }

Architect · October 21, 2010, 11:06am

Oh,

I am using faster one TransferBytesDirect in these tests. It doesn’t have loops.
;D

Bec_a_Fuel · October 21, 2010, 11:09am

Then I give up

Thanks anyway.