Is the SPI1 on UEXT?

Gus_Issa · November 16, 2010, 9:18pm

2000 samples/sec using managed code is very fast. Remember that you are running managed, not native code. How many samples are you trying to get? You may want to use a little micro that collect the data then FEZ can read them in large chunks and save to SD card.

Mike · November 16, 2010, 9:18pm

What do you think you should be getting from a. Panda?

Skizzo · November 16, 2010, 9:56pm

Mike: I’m not sure exactly how fast I was expecting, but with 72Mhz versus the 16MHz of the ATMEGA168, My impression was that I shouldn’t have a problem acquiring data from a hardware SPI bus at least as fast as I could with the Arduino. The Arduino has no problem doing 2500Hz and transferring that data to a serial port. The Panda isn’t doing anything else aside from flashing the LED, It’s still falling short of that. It’s a bummer.

Gus: I’d like to be able to log at least twenty 16-bit channels at 200Hz - this includes 16 ADC Channels, a few digital channels, more data from an Atmega168 that’s doing frequency counting (I’ve already ruled out the panda doing that while other things are going on) and data from the CAN bus. So, of all of that data, SPI transfer would need to be at least 3200Hz, but it would also have to acquire that other data from those other places, too.

I’ve spent a fair amount of time on a custom board ([url]http://phydiux.com/images/logger3_prelim1.jpg[/url]) for my project with a USBizi 100 Chip, so I ordered the Panda to spend some time testing things out before pulling the trigger on getting a few boards produced. At this point, I don’t know whether to try to continue using the .NET micro/USBizi chip; try to run it as an lpc2387 with c code on it; or redo the whole board with another chip altogether - probably an Atmel ARM processor since I’m most familiar with them.

This is my first project using an ARM processor - my impression was that they were supposed to be so much more adequate and overkill than what I needed. Maybe they are, but I don’t see that with the .NET framework if the Panda is an accurate indicator of speed.

How much faster would it be it it were developed using c? Can you give me a rough guesstimate? Twice as fast?

Gus_Issa · November 16, 2010, 10:37pm

This is new to embedded developers so I understand the confusion. USBizi runs managed system, there is a million thing that run the background to make sure the system is stable and all is handled smoothly. If you compare your 16Mhz ATMEGA running native code to an 72Mhz LPC running native code the yes the LPC will be many many times faster. Now USBizi runs NETMF/managed-system so comparing it to a micro running native code is not fair. ATMEGA would never run NETMF for example.

If you do this from scratch using native (C/assembly) then you will get more speed but then you will lose everything that comes with NETMF. Now, is NETMF the answer for every product? The answer is no. Depending on what you do, NETMF maybe good or not.

GMod_Errol · November 17, 2010, 4:43am

Also remember that, although VS2010 does compile your code, it doesn’t produce raw CPU code.

The compiled code runs inside a virtual machine on the CPU. This is why it is a lot slower.

Also, that is the differance between Managed and UnManaged/Native code. Managed runs inside a virtual machine, Unmanaged/native runs raw CPU code.

Am I correct in this Gus?

On the speed issue: I needed to produce a short 40KHz signal and sample an analog value at at 40000s/s. This was for an Ultrasonic radar type sensor. I tried to use a Panda, but ended up with a PIC micro doing the grunt work and the Panda just getting the data over the I2C bus for futher processing…

Mike · November 17, 2010, 7:56am

For more speed and .net mf there is the GHI chipworkx module. 200mhz and lots of memory.

A bit more costly but if you are just making a few boards, there is the development time savings.

Gus_Issa · November 17, 2010, 8:25am

Correct Errol.

By the way, the FAQ is very important for everyone to read. Thsi is covers in there under “How does FEZ work? Is it fast?” FAQ – GHI Electronics

Skizzo · November 17, 2010, 8:51am

Regardless of managed code or not, it’s kind of disheartening that the processor can’t even manage 25% of it’s potential because it’s running managed code. This totally abolishes any hope of having something reasonably fast running .net micro framework on it for now, anyway.

I’ve read the FAQ, and it’s completely non-committal about the speed one should expect to see from the FEZ. All the forum postings stating things like “Yeah, but is it fast?” do nothing to help a newcomer determine the speed in which the microcontroller will work; the only thing one has to go on is the CPU rating and features related to the board itself.

“Now, is it fast? Yes, it is fast depending on if you write your code right.”

I posted as code example; nobody seems to have any suggestions on making it faster. I’m only left to assume that this is the speed that it is, which is pretty much what I’m being told when Gus tells me that 2000 samples/second is fast.

“Someone can comment and say, I can toggle a pin faster on a PIC/AVR/Arduino than I can on FEZ! This is correct, but what kind of project only toggles a pin? Make a decently sized project and then compare speed.”

With microcontrollers, isn’t toggling pins the basis for pretty much any IO that happens on the processor?

I know GHI’s stance is that it’s “fast”; I think you guys should tone that down - you should say that the speed is comparable to chip X, Y or Z - that gives buyers a reasonable expectation of what they should see.

I don’t care how you try to justify it, this chip is not capable of anywhere near what it’s frequency rating would even suggest, and that’s the only thing I had to go on since the FAQ does not directly answer anything about real-world comparisons of speed. It doesn’t matter whether you’re running managed code or not; what matters is the end result to the consumer of the product.

Jeff_Birt · November 17, 2010, 9:53am

Just saw you posted a code sample, great, that is a big help. After a quick glance at the MCP3208 data sheet and the NETMF SPI docs it looks as though you don’t have the SPI configured correctly. The first paragraph of section 6.1 in the data sheet says:

[quote]With most micro-controller SPI ports, it is required to
send groups of eight bits. It is also required that the
micro-controller SPI port be configured to clock out data
on the falling edge of clock and latch data in on the
rising edge.[/quote]

You have your SPI configured to clock on the rising edge. I would guess if you get the SPI configuration tweaked in you’ll find you can also increase the clock rate quite a bit. I don’t have a MCP3208 here to test with or I would try it out too.

As to ‘how fast is it’, with any micro-controller or even PC that is a loaded question. It all depends on what you are doing, what it takes on that platform to accomplish it, and how optimized your code is. For example you can use a PWM output on the FEZ and it requires very little code or processor time, if you have to bit bang that on another processor you’ll use up most of its resources to do so. You should find that SPI works the same way. The only additional overhead between using a HW SPI and bit-banging it is that you must clock out full bytes on a HW SPI.

(Funny note, when spell checking the above paragraph the spellchecker came back with a suggestion of ‘head-banging’ for ‘bitbanging’, how true it is.)

Gus_Issa · November 17, 2010, 12:56pm

Skizzo you are not understanding the system and this is why you are not understanding the FAQ.

There is NO fixed number where you say, using NEMTF is xxx% slower/faster than native. I can read about 250KBytes/sec from SD card which is a hole a lot faster than using ATMEGA but then ARTMEGA can toggle a pin a lot faster that FEZ.

Yes 2000 samples/sec is fast when running managed code and yes going to native will make this many folds faster. No one is hiding this info away from anyone. It is even on the FAQ

That is simply impossible! I just gave you to simple examples where one is faster and other example is the other way around.

Then managed code is not the answer for your needs. FEZ is not out to replace every single chip and very single offer. It is one of the good option and for many applications it is just fantastic according to hundreds of users

Skizzo · November 17, 2010, 1:33pm

Jeff: I appreciate the constructive response…

I’ve tried both configurations; it doesn’t seem to matter either way - I seem to get the same results from the ADC and neither setting affects the speed of the Panda. I’ve tried both configurations because the .Net micro documentation is kind of vague on what setting means what.

In SPI, the master determines the transfer speed, not the slave, so edge should have no bearing on the speed of the transfer itself (as long as the slave can keep up, of course.)

Whether the clock rate is set to 200, or 10000, the resulting sample count is pretty much the same - like withing a few milliseconds. I’ve tried random rates from 200 KHz to 10,000 KHz (10Mhz) and none of them affect the speed much if any - this leads me to believe that the SPI bus isn’t the issue; it’s the bitshifting & masking that’s taking most of the remaining available processing power of the panda to do. Also, the performance increase by commenting out the light blinking is another indicator that it’s not specifically an SPI issue.

The thing about this is that you can try it out as it is - you don’t need the hardware to go along with it for this example. Ground MISO on the Panda and your resulting sample will always be 0 - pull MISO high and your sample will always be 4095. You’re going to get 10000 samples in 4300-ish milliseconds.

I’m not convinced the problem I’m having is with the SPI bus specifically; I think it’s with the processing required to convert the data from SPI sample into a usable value. Try it for yourself and see how long it takes your Panda to do this.

Gus: I understand the FAQ, and I still think it’s extremely vague on the capabilities of the Panda. I also understand that GHI would be in a position to proclaim it’s benefits, while downplaying it’s faults - It only benefits you to do so - you are the vendor and the whole point in you doing this is for you to make money and support yourselves. And that’s totally fair - you guys have spent the time and energy to bring this product to market - I’m not faulting you for any of that.

What I’m saying is that you guys can and should be more upfront about how well it actually works given differing scenarios. Hiding behind “but, it’s not fair to compare it to X because it’s managed code!” is just not an acceptable answer. You can still give a general idea of differing aspects of the performance I should expect to see from a panda in very common situations.

SD/MMC speed? Great - 250KBytes/sec.
SPI speed? Up to 18Mhz.
Math? Well, it’s a little slow with bit shifting and division. this happens to be what I need, so that sucks for me.

What else should we add to the list?

I didn’t ask for an exact number or percentage on how much faster/slower the Panda is - I asked for a general idea - Fast is totally relative, as you yourself have admitted in other threads. 2000 samples/second may be fast for managed code, but how is one to know with the complete lack of other low-cost hobbyist development boards running .net micro that are laying around?

And it’s great that hundreds of people find that it’s fantastic - I was hoping to do the same, to the point where I designed a PCB around it before being able to use it. I don’t think it’s a bad board by any means but it’s certainly less capable than I thought it would be.

Gus_Issa · November 17, 2010, 2:20pm

If you have any suggestions on what should go on FAQ, please pass them to me and I will get them to the guys here for review.

Jeff_Birt · November 17, 2010, 4:36pm

I didn’t think of that, the readings don’t matter.

First a few minor points I did some simple profiling and found that if I removed the LED blinking and data post processing I could get read rates of just over 4K/s. One interesting point is that a for loop was significantly faster than the while.

While I’m not an SPI expert my limited understanding is that for each clock cycle it writes out a bit and reads in a bit. Looking at the way it is implemented in NETMF it looks to me as though if your write array is longer then you’ll be able to read more in on each cycle through the loop.

To try this idea out I increased the size of the write and read arrays, wrote 100 bytes out on each cycle through the loop and read in as many bytes each time (this is still 10K reads). The read rate went up to almost 39K! The code below is only accurate for the number of reads. I did not try to create usable code.


using System;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
using GHIElectronics.NETMF.FEZ;
using GHIElectronics.NETMF.Hardware;
using GHIElectronics.NETMF.IO;

namespace MCP3208_Reading
{
    public class Program
    {
        static bool ledState = false;
        static OutputPort led = new OutputPort((Cpu.Pin)FEZ_Pin.Digital.LED, ledState);
        static Cpu.Pin SSELPin = (Cpu.Pin)FEZ_Pin.Digital.Di10;
        static SPI spi = new SPI(new SPI.Configuration(SSELPin, false, 0, 0, false, true, 35000, SPI.SPI_module.SPI1));
        static byte[] writeData = new byte[100];
        static byte[] readData = new byte[1000];
        static short ADCVal = 0;

        public static void Main()
        {
            // Init the ADC by doing one measurement, and doing nothing with it (this sample is invalid.)
            spi.Write(new byte[1]);

            long startTick = DateTime.Now.Ticks;
            writeData[0] = 0x1F; // What are we writing? 0x18 says A0, 0x1F is A7
            
            for (int i = 0; i < 100; i++)
            {
                spi.WriteRead(writeData, readData, i); // Skip the first byte
            }

            //Do post processing here
            ADCVal = (short)((readData[0] << 6) + (readData[1] >> 2));
            ADCVal &= 0xfff; // We only want the rightmost 12 bits

            long elapsedTime = (DateTime.Now.Ticks - startTick) / TimeSpan.TicksPerMillisecond;

            Debug.Print("Time Taken: " + elapsedTime.ToString() + "ms");
            Debug.Print("Sample rate: " + (10000.0 / (elapsedTime / 1000.0)).ToString());
        }

    }
}

Skizzo · November 17, 2010, 9:31pm

Jeff:

Uh, but the post processing portion has to happen for each write/read, otherwise the data are useless the way they sit in the read byte array. I had not thought about speed issues with the while loop, but that’s certainly a great thing to point out.

Additionally, the write/read fills the byte array from byte[0] through byte[n], for every writeread() you do. So, if you declare a readbyte[2], then the writeRead() will fill the two bytes with 16 bits read from the SPI slave.

This isn’t quite correct - the panda writes an array of bytes, and reads an array of bytes - so, if you define a 2-byte array for write, it will write 2 bytes. If you define a 10-byte array, it will write 10 bytes. The same this occurs on the read end. So, when you’re defining your 100 byte array on the read side in your example, it’s reading in 100 bytes from the adc. Unfortunately, this will slow the transfer way down in real life

You have to make sure that the ADCVal is inside the for/while loop because that’s the end-result for each sample taken from the ADC. so, your code has to to be like:


            for (int i = 0; i < 100; i++)
            {
                spi.WriteRead(writeData, readData, i); // Skip the first byte
                //Do post processing here
                ADCVal = (short)((readData[0] << 6) + (readData[1] >> 2));
                ADCVal &= 0xfff; // We only want the rightmost 12 bits
            }

Along with resizing your read byte array back down to two bytes would have to occur, otherwise you’d be reading 100 samples from the ADC, and then only using the first on for your ADCVal, which is the computed value from the ADC after it’s been bit shifted and masked - I need more than 2000 of those per second.

Jeff_Birt · November 18, 2010, 9:18am

Why woud reading/writing 100 bytes at a time slow things down? Seemed to speed it up to me.

Skizzo · November 18, 2010, 12:40pm

If you understand why the code you posted would be broken, you’ll understand why reading 100 bytes when you only use two of them would be slower than only reading in 2 bytes.

And actually, with your original code, you read 1000 bytes 100 times, and then just use two bytes from all that reading from your last sample to find ADCVal. So, however long that took to run only produced one usable value.

Jeff_Birt · November 18, 2010, 1:01pm

As I stated, the code I posted was not intended to do anything other than test the read rate. Reading in two bytes then perform some operation on them and repeating 50 times yields the same result as reading 100 bytes and performing an operation on byte pairs 50 times. The difference is that the read rate is much faster in the second case.

Here is the general idea

for index = 1 to 100
{
Read N bytes

Process N bytes

}

Skizzo · November 18, 2010, 1:34pm

What you’re saying your code does and what your code actually does are two different things - it does no processing of the data in the byte pairs. It processes two whole bytes, out of all of the bytes you sample.

Jeff_Birt · November 18, 2010, 3:25pm

Please read my entire response before having a knee jerk reaction. I don’t know how to make myself more clear about what the example I posted did and did not do. As the raw read speed went up 10x in my sample code as long as the processing of 100 bytes takes less time it will yield a speed increase.

EDIT: Since obviously I was being clear as mud I took 5 minutes and copied your data processing lines and copied them into a second data processing loop. (Code below) The result is 17,793 samples/second or about 7.5 times faster than what you started with. Keep playing with the number of bytes written/read at one time. There will be some optimum range that will yield the best performance.

Think of the situation like this. You have to carry marbles from one room to another and then sort them into bins by color. The way your original code worked you were only carrying two marbles at a time and spent all of your time running from room to room. The way my code sample works is to carry 100 marbles at a time and then sort them all at once. There will be some number of marbles that you can carry at once which will provide the most number of marbles sorted per minute.

using System;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
using GHIElectronics.NETMF.FEZ;
using GHIElectronics.NETMF.Hardware;
using GHIElectronics.NETMF.IO;

namespace MCP3208_Reading
{
    public class Program
    {
        static bool ledState = false;
        static OutputPort led = new OutputPort((Cpu.Pin)FEZ_Pin.Digital.LED, ledState);
        static Cpu.Pin SSELPin = (Cpu.Pin)FEZ_Pin.Digital.Di10;
        static SPI spi = new SPI(new SPI.Configuration(SSELPin, false, 0, 0, false, true, 35000, SPI.SPI_module.SPI1));
        static byte[] writeData = new byte[100];
        static byte[] readData = new byte[100];
        static short ADCVal = 0;

        public static void Main()
        {
            // Init the ADC by doing one measurement, and doing nothing with it (this sample is invalid.)
            spi.Write(new byte[1]);

            long startTick = DateTime.Now.Ticks;
            writeData[0] = 0x1F; // What are we writing? 0x18 says A0, 0x1F is A7
            
            for (int i = 0; i < 100; i++)
            {
                spi.WriteRead(writeData, readData, i); // Skip the first byte

                for (int j = 0; j < 100; j = j +2)
                {
                    //Do post processing here
                    ADCVal = (short)((readData[j] << 6) + (readData[j + 1] >> 2));
                    ADCVal &= 0xfff; // We only want the rightmost 12 bits
                }
            }

            long elapsedTime = (DateTime.Now.Ticks - startTick) / TimeSpan.TicksPerMillisecond;

            Debug.Print("Time Taken: " + elapsedTime.ToString() + "ms");
            Debug.Print("Sample rate: " + (10000.0 / (elapsedTime / 1000.0)).ToString());
        }

    }
}

Skizzo · November 18, 2010, 5:39pm

Your samples are irrelevant because even if they’re faster, they will never produce any accurate results because you have to interface with the other device in the way that the other device expects to be interfaced. They are incompatible with the device’s protocol, so it doesn’t matter how fast they are.

I’ve read the entirety of all your posts. Maybe you should read the entirety of mine?

Here are some marbles for you:

The room you’re getting your marbles from has a dispenser that only produces two marbles at a time until you physically leave the room and come back to it. This is how SPI works with this device. You can’t take 100 marbles because there aren’t 100 marbles to take - there are only two of them. You run into the room and you take what you think are 100 marbles, just to find out later that you only have two marbles. The other 98 of them are not marbles, and are not sortable into your bins. In order to get only two more marbles, you’ll have to go back into the room.

That’s what your code does.

You said yourself you were no SPI expert. And you’re kind of ruining code samples for other people who will wonder how to use SPI to interface with an MCP3208 down the road. Whether it’s one, two or five years, I hope nobody runs across your samples and assumes that they work.