[solved] Panda runs around 3x faster than Cobra?

I have asked some other manufacturers to run my test app to get a better picture of actual performance of NET MF devices. I did not ask for permission to post results so I will not.

But, some interesting insights:

  • the problem of memory slowing down the CPU is not unique to the EMX. I’ve results from a 200Mhz ARM9 that is actually slower than the EMX

  • I got the numbers for a board compiling the BSP using GCC and RVDS. That’s a pretty big difference. Around 2x.

  • I have received a printout of the test program run by an EMX (the supplier did it himself because he had one). Guess what: his EMX is twice as fast as mine. From the output of the test app I can see he has a somewhat outdated Firmware. It is version 4.1.3.0.

I’d like to ask GHI to look into this. Might it be a firmware problem after all?
Or was there any change to the hardware (I have no knowledge of the EMX Revision used by said person).

Kind regards
Mark

I am glad you see that :slight_smile: GHI goes to great measures to optimize everything.

This is simply impossible. I can see differences between reversions but not double the speed!

Either way, your test may not be valid for many reasons, mainly that all code is interpreted and managed by the run-time. I hope you see that we have really spent a lot of time on this and I tried to explain as much as I can but at this point we need to stop going in the same circle. Like I said before, EMX is the most feature-rich and most optimized NETMF device in the world, just take my word for it :slight_smile:

Downgrade to Firmware 4.1.3.0 and see the effect this has on EMX performance!

Just to give an idea:
Panda 4.1.6.0
time FillArrayWithIndexValue bytes1 [4.824 ms]
time UTF8Encoding.UTF8.GetChars [0.397 ms]
Time loop: if (i % 10 == 0) u4++ [3.242 ms]
sum byte array ‘while --’ [3.421 ms]
time IntToASCII 4000: 4000 [0.338 ms]

Cobra 4.1.3.0
time FillArrayWithIndexValue bytes1 [6.307 ms]
time UTF8Encoding.UTF8.GetChars [0.624 ms]
Time loop: if (i % 10 == 0) u4++ [3.975 ms]
sum byte array ‘while --’ [4.516 ms]
time IntToASCII 4000: 4000 [0.471 ms] [italic]-- btw, this is the one so important for me[/italic]

Cobra 4.1.6.0
time FillArrayWithIndexValue bytes1 [13.128 ms]
time UTF8Encoding.UTF8.GetChars [0.960 ms]
Time loop: if (i % 10 == 0) u4++ [9.012 ms]
sum byte array ‘while --’ [9.743 ms]
time IntToASCII 4000: 4000 [1.381 ms]

Panda still faster, but speed difference is now something I’d call “overhead” of external ram.

Sorry to say Gus, but the only one who wasted time here (yours, mine, the communities) is you.

Look at the whole discussion and please reflect for a moment: what have you done to help?
Just nothing. You tried to downplay the problem, you asked me what I’m trying to accomplish here, you told me I’m wasting your and the communities time and lastly you told me how great GHI is.

We would all have wasted less time if you had just taken the problem serious from the beginning.
Hopefully there will be a fix to this shortly.

mark

I think you are being a bit hard on GHI support. Gus is trying to help. In general, with any system performance problems (PC, server etc), you don’t assume there is a problem with the device/firmware but what the user was trying to do with it particularly if this is the first report.

However, in this case, the firmware seems to be the issue.

I downgraded the firmware to the device I tested previously so we could compare apples-to-apples. You can see this is much closer to the Panda II time of 337.640 ms. With 4.1.3.0 on the Cobra, the total time is 480.042 ms vs 694.379 ms for the Cobra on 4.1.6.0. Seems like there is definitely a problem with the firmware.

PerformanceTester - PrintSomeInfo
System Version: 4.1.3.0
Cpu.SlowClock: 18000000
Cpu.SystemClock: 18000000
Debugger Attached: True

PerformanceTester - MiscTests
time nothing [0.009 ms]
time Utility.ComputeCRC 2701537051 [0.145 ms]
time IntPlaces 0: 1 [0.175 ms]
time IntPlaces 4000: 4 [0.266 ms]
time IntPlaces -4000: 5 [0.342 ms]
time IntPlaces int.MaxValue: 10 [0.494 ms]
time IntPlaces int.MinValue: 11 [0.489 ms]

PerformanceTester - ArrayTests
time fill bytes1 [7.995 ms]
time FillArrayWithIndexValue bytes1 [7.609 ms]
time Array.Clear [0.097 ms]
time Array.Copy [0.168 ms]
time bytes1.CopyTo [0.221 ms]
time Array.IndexOf 98 is 98 [0.454 ms]
time Utility.CombineArrays 200 [0.128 ms]
time Utility.ExtractValueFromArray 50462976 [0.089 ms]
time Utility.ExtractRangeFromArray 10 [0.066 ms]
time bytes2[99] == 99 True [0.042 ms]
time bytes2[bytes2.Length-1] == 99 True [0.020 ms]

PerformanceTester - StringTests 100chars
time init string of 100 chars [0.029 ms]
time UTF8Encoding.UTF8.GetBytes [0.412 ms]
time UTF8Encoding.UTF8.GetChars [0.626 ms]
time new string(chars) [0.190 ms]

PerformanceTester - StringTestsShort 20chars
time init string of 20 chars [0.029 ms]
time UTF8Encoding.UTF8.GetBytes [0.392 ms]
time UTF8Encoding.UTF8.GetChars [0.435 ms]
time new string(chars) [0.174 ms]

PerformanceTester - SomeLoopOps
Time loop: i++, u++ [5.265 ms]
Time loop: i += 1, u += 1 [4.917 ms]
Time loop: f = 2.0f/3.0f [4.699 ms]
Time loop: d = 2.0/3.0 [4.960 ms]
Time loop: if (i % 10 == 0) u4++ [8.118 ms]

PerformanceTester - ClearByteArray
create new byte array [0.038 ms] 00
Array.Clear [0.094 ms]
clear using ‘for ++’ [6.021 ms]
clear using ‘for --’ [6.851 ms]
clear using ‘while --’ [7.215 ms]

PerformanceTester - SumByteArray
sum byte array ‘for each’ [9.274 ms]
sum byte array ‘for ++’ [7.077 ms]
sum byte array ‘while --’ [7.434 ms]

PerformanceTester - IntToByteArrayTests
time IntToASCII 0: 0 [0.297 ms]
time IntToASCII 4000: 4000 [0.760 ms]
time IntToASCII -4000: -4000 [0.779 ms]
time IntToASCII int.MaxValue: 2147483647 [1.885 ms]
time IntToASCII int.MinValue: -2147483647 [1.615 ms]
time int.ToString + GetBytes 0: 0 [1.301 ms]
time int.ToString + GetBytes 4000: 4000 [1.534 ms]
time int.ToString + GetBytes -4000: -4000 [2.245 ms]
time int.ToString + GetBytes int.MaxValue: 2147483647 [1.935 ms]
time int.ToString + GetBytes int.MinValue+1: -2147483647 [1.923 ms]

tests ran in: [480.042 ms]
The thread ‘’ (0x1) has exited with code 0 (0x0).
Done.

Dear maoli, I am sorry this is how you see it. I ma trying to do my best. Please feel free to call GHI and express your concerns.

Dear Zoomer, thanks for understanding. I will continue to help as I always do!

About the issue, you are seeing performance difference between 2 firmware versions on EMX. Do you see the same on USBizi? I need to collect as much info as I can so this can be passed on to our experts.

That’s an interesting point. Below are the results of the downgraded firmware for Panda II. There isn’t much difference with 4.1.6.0 at 337.640 ms and 4.1.3.0 at 340.577 ms.

So something seems to be affecting Cobra in the new firmware resulting in a pretty significant performance drop.

PerformanceTester - PrintSomeInfo
System Version: 4.1.3.0
Cpu.SlowClock: 18000000
Cpu.SystemClock: 18000000
Debugger Attached: True

PerformanceTester - MiscTests
time nothing [0.005 ms]
time Utility.ComputeCRC 2701537051 [0.092 ms]
time IntPlaces 0: 1 [0.116 ms]
time IntPlaces 4000: 4 [0.189 ms]
time IntPlaces -4000: 5 [0.235 ms]
time IntPlaces int.MaxValue: 10 [0.310 ms]
time IntPlaces int.MinValue: 11 [0.353 ms]

PerformanceTester - ArrayTests
time fill bytes1 [5.828 ms]
time FillArrayWithIndexValue bytes1 [5.906 ms]
time Array.Clear [0.054 ms]
time Array.Copy [0.091 ms]
time bytes1.CopyTo [0.126 ms]
time Array.IndexOf 98 is 98 [0.346 ms]
time Utility.CombineArrays 200 [0.121 ms]
time Utility.ExtractValueFromArray 50462976 [0.050 ms]
time Utility.ExtractRangeFromArray 10 [0.063 ms]
time bytes2[99] == 99 True [0.024 ms]
time bytes2[bytes2.Length-1] == 99 True [0.037 ms]

PerformanceTester - StringTests 100chars
time init string of 100 chars [0.017 ms]
time UTF8Encoding.UTF8.GetBytes [0.232 ms]
time UTF8Encoding.UTF8.GetChars [0.452 ms]
time new string(chars) [0.116 ms]

PerformanceTester - StringTestsShort 20chars
time init string of 20 chars [0.017 ms]
time UTF8Encoding.UTF8.GetBytes [0.190 ms]
time UTF8Encoding.UTF8.GetChars [0.237 ms]
time new string(chars) [0.117 ms]

PerformanceTester - SomeLoopOps
Time loop: i++, u++ [3.961 ms]
Time loop: i += 1, u += 1 [4.077 ms]
Time loop: f = 2.0f/3.0f [3.454 ms]
Time loop: d = 2.0/3.0 [3.674 ms]
Time loop: if (i % 10 == 0) u4++ [6.137 ms]

PerformanceTester - ClearByteArray
create new byte array [0.025 ms] 00
Array.Clear [0.054 ms]
clear using ‘for ++’ [4.574 ms]
clear using ‘for --’ [5.191 ms]
clear using ‘while --’ [5.058 ms]

PerformanceTester - SumByteArray
sum byte array ‘for each’ [6.829 ms]
sum byte array ‘for ++’ [5.168 ms]
sum byte array ‘while --’ [5.609 ms]

PerformanceTester - IntToByteArrayTests
time IntToASCII 0: 0 [0.209 ms]
time IntToASCII 4000: 4000 [0.514 ms]
time IntToASCII -4000: -4000 [0.623 ms]
time IntToASCII int.MaxValue: 2147483647 [1.159 ms]
time IntToASCII int.MinValue: -2147483647 [1.167 ms]
time int.ToString + GetBytes 0: 0 [0.955 ms]
time int.ToString + GetBytes 4000: 4000 [0.964 ms]
time int.ToString + GetBytes -4000: -4000 [1.353 ms]
time int.ToString + GetBytes int.MaxValue: 2147483647 [0.929 ms]
time int.ToString + GetBytes int.MinValue+1: -2147483647 [1.163 ms]

tests ran in: [340.577 ms]

This is good news actually! There is one thing we have planned on doing on firmware to improve performance but it is some major change so we decided to do it in the move to NETMF 4.2. I wonder if this is related somehow. Anyway, I will pass this on to the experts at GHI to investigate further.

I still want to make it very clear for future readers that comparing speeds between EMX and USBizi is completely pointless. It is like comparing apples to oranges. EMX will be faster when using a lot of RAM, USBizi will be faster when not. Both have same processor speed and family.

I had asked Microsoft long time ago for to provided us with a test software that we can use to measure performance. The test has to be made by someone who knows the internal CLR in deep details. With our test here, we are making a lot of assumptions. Are we testing CLR execution (managed) or native execution for example.

The last line (“tests ran in” ) is quite misleading. It includes all time wasted on debug.print. I shouldn’t really be in there. You could have a fast cpu, but use serial as debug interface. Each test would be faster, but the last line show a higher runtime :-[

I have also not thought about any way to weight these results - so just summing the time of each individual test is really pointless.

I agree with Gus this should be done by someone with much more expertise that me. These “benchmarks” where really just a quick way to get to know NET MF better.

I also tried the Panda with older Firmware - my numbers are consistent with Zoomer’s.
It really looks like something post 4.1.3 affected only the EMX module.

I see there was no update to the Cobra firmware yet.

So we have a bug that causes every Cobra customer using one of the more recent firmwares to experience a very significant performance drop - yet this could not solved in almost 2 months?

We actually made some changes and it is faster now, but the new firmware is not public yet. Maybe in 2 more weeks.

@ maoli,

I just noticed this tread. Read it from start to finish. I have to say "Dude, whats your problem?"
If i was GUS and you were talking to me that way. I would have told you to blow off.
There is a right way to handle things and a wrong way. Clearly you chose the wrong way
Here is a though, Try being polite. It may get you farther in life.

Anyone with any education on embedded can tell you that when you have the same processor running at the same speed yet one is using external memory that one will be the slower one. So even with that in mind you made the decision to buy the cobra. So you pissed because ?

The part why i feel compelled to speak up here and knock you down a few pegs is because your picking on GUS who has helped countless of us guys here, including me. I dont think you have any concept just how many e-mails and forum messages he goes through per day. I for one could not do that. On top of that he is VERY responsive when a new post comes along.

So ok, you ran some testes and discovered that the cobra is slower. I can understand being disappointed. But to come off the way you did is just plain wrong!

I think it great that you discovered that the different firmware version may have something to do with it all, but you could have handled this far differently than you did.

I for one would like to take the time out here and THANK GUS for all he has done for all of us, and to also say to GUS, there are always going to be weeds in the flower bed.

To be faire, we had to move some objects which caused a lower performance. We knew about it but the fix was no simple. An sdk was released but with this item on the to do list. This only effected few objects and not everything. We finally found away around this and have it fixed.

Note, performance will be better than before but running with external ram will always be slower.

GHI strives on making everyting as perfect as possible and at no cost to anyone. Not to forget new features, also at no cost. This is what keeps us in a fast growing business :slight_smile:

Thanks for the update Gus. I will be ready to test it when it comes out.

LostInFezLand:
I do recognize Gus’ work. But that does not mean we cannot disagree on something - as we did.

Gus is professional enough to accept critical / complaining users in here.
I hope you are very happy about your smart and polite post and for letting me know you would have told me to blow off. See you.

Come on guys, keep the spirit high here. There’s war enough elsewhere, don’t start one here.
We are all here to do exciting and fun stuff and learn from each other, so let’s keep it that way.

Thanks Eric