Managed code and speed on the EMX

Townsend · December 16, 2010, 10:49am

I have been doing some timings on a firmware model that would provide multiple protocol support in a vehicle application. The reference is an existing piece of hardware.

The model uses multiple threads. Messages are moved about via queues, and threads spend most of their lives hanging on events. One end is USB and the other is CAN. The simplest exchange with the PC is the retrieval of the version string. The reference hardware processes 2000 of these actions in 4.06 seconds. The EMX model takes 21.26 seconds to do the same. I have timed various sections looking for particular bottlenecks and find the time spread pretty evenly over the entire process. However, there are two areas that jump out - the movement of information into and out of structures (using the BitConverter class I found on the forum) and the USB interaction. I hate to forego the structures, but this is about 10-20% of the time, depending on the message traffic. Any suggestions would be welcome.

As to the USB, this is harder to capture. Measuring from TX to the PC to the following RX I am seeing 7 seconds of the 21 second total. Since everything is identical at the other end of the USB wire, and the entire process can be completed in 4 seconds, it would seem that the USB interface is introducing some delays. I do have a heartbeat thread running off a 500ms sleep. The CAN task is not even started for this test. Has anyone else experienced throughput issues with USB?

Gus_Issa · December 16, 2010, 11:05am

Using managed system comes at a cost. Debugging is easier, development is done in no time but then there is the overhead of the managed system.

I haven’t used it so I do not know a lot about it but I know that whenever you do bit handling tasks that does a lot of instructions in internal loops, you have to optimize as much as possible to the point you use inline coding and not use fancy classes.

Another option is to pass processing intensive applications down to RLP methods.

There is also ChipworkX, which is about 6 times faster than what you have now.

User_0 · December 16, 2010, 3:21pm

Not actually doing any loops. We are not talking about a lot of instructions here. There is simply a lot of traffic. One concern is losing the advantages of C# to overcome its disadvantages. In the meantime, why would the USB interface be so different? I have traces that show the turnaround on the PC is within the same millisecond. The existing hardware uses a cypress west bridge controller. Does it make sense that the EMX board would be much slower at managing USB?

User_12 · December 16, 2010, 6:44pm

How big are the data chunks your application is sending over USB?

Jeff_Birt · December 16, 2010, 8:57pm

Can you post some sample code (a small piece of code) that shows the problem that you are having? It would help to see how you are doing things.

Townsend · December 17, 2010, 2:56pm

Jeff,

Here is an example - this code occurs twice per message as messages are moved from the USB task to the Command task and then back. It is part of the message queuing scheme - each thread waits on a queue event. The queue declares these items:


	ManualResetEvent m_eEvent = new ManualResetEvent(false); 
	AutoResetEvent m_eLocker = new AutoResetEvent(true); 
	Queue MsgList = new Queue();

The post routine calls:


	public void AddMsg(MXMsg Msg)
	{
 		m_eLocker.WaitOne();

		MsgList.Enqueue(Msg);

		if (MsgList.Count > m_iMaxQ)
			m_iMaxQ = MsgList.Count;

		m_eEvent.Set();
		m_eLocker.Set();
	}

and the waiting thread calls:

        public void WaitQEvent()
	{
		if (MsgList.Count == 0)
		{
			m_eEvent.WaitOne(); // wait forever here
			m_eEvent.Reset();
		}
	}

To send a message to the command task from the usb task I yank the following method of the command task. Returning the message is identical code:

	public void PostMsg(MXMsg Msg)
	{
		m_qCmd.AddMsg(Msg);
	}

As you can see there is not a lot of code here. In my test that involved 2000 messages (4000 total posts) I found that the routing consumed 7.280 seconds of the 23 second total. Now, I could make all the declarations public - eliminate the post methods and manipulate the queues directly. But this rather defeats the usefulness of using classes. I think the answer to all this is a much faster processor.

If you see something else let me know. I appreciate your willingness to help.

Townsend · December 17, 2010, 3:03pm

Joe - Sorry. I was not alerted by your question - think the forum lost my identity on that previous post.

We are sending 128 byte packets. There is 1 packet down and 1 going back. In my basic test this was a total of 4000 packets over USB.

Thanks