EMX and Ethernet

Townsend · January 5, 2011, 11:02am

I have been doing some testing with USB and TCP and comparing all this to an existing USB link. While I have found USB on the EMX board a bit slower, I have found TCP/IP to be very slow - and this with the Nagle algorithm disabled. Has anybody else benchmarked TCP? How much of the TCP stack is implemented in managed code?

Thanks

Gus_Issa · January 5, 2011, 11:44am

Can you define “slow”? “slow” means different things to different applications.

Downloading a file through HTTP on EMX, you can get 150KBytes/sec. That is all from C#.

User_12 · January 5, 2011, 11:51am

Take a look at your code. You might be sending few bytes (less than 100 byte) in every TCP transaction. TCP packets accepts about 1400 byte (not sure). But to be in the safe side, try to send your data in chunks of 1024 or more to get the best performance.

The reason is that with TCP communication there is a lot of overhead data is being sent with every packet (IP address source/destination/ MAC, CRC … etc) so imagine that you are sending one byte in every packet… that will be very slow

Townsend · January 5, 2011, 4:11pm

I am sending 130 byte packets. USB averages 1.17 ms for a write to the PC. TCP averages 9.7 ms. These values occur over 10,000 or so events. Both have the same time measures applied - so overhead is the same. I was surprised that sockets were this slow. I happen to be using a stream at the moment because of something I read in the socket doc about receive. I am going to perform the sends off the socket alone and see what happens. But I can tell you this - a CAN module doing a memory dump gets about 1,000 messages stacked up at the TX thread for sending to the PC. On USB it maxes out a 4.

Townsend · January 5, 2011, 4:14pm

I don’t have the luxury of determining my packet sizes. I realize there are inefficiencies - but tcp must be used to get us to wireless. I disable aggregation (setting the NoDelay option) so that each send should go. I suspect that the USB routines are simply much closer to the hardware than the tcp stack, and that there is some wandering in managed code before it reaches something real.

User_12 · January 5, 2011, 4:29pm

You don’t need to determine the packet size.
You only need to stack ten 130 byte packets in a 1300 byte buffer for example and send it through TCP socket. Or a 1024byte fifo that flush the data through tcp socket as soon as it get filled

In theory, it should take about 1-2ms to send 130 bytes.

Townsend · January 5, 2011, 4:43pm

Granted I could add meta-packets - although I cannot wait for it to fill. As soon as you introduce waiting the system will get much too slow. However, in the test I just did my tx queue reached 3500 messages when attempting to send 8500. I could check the tx queue for messages until the meta packet was full, and ship that off.

However, it does not really answer the question of why this is so slow. The tcp overhead is pretty incidental on a 10 mbit connection. However, building the headers in managed code could be trouble - which of course would also apply to building the meta-packet in managed code. If the theoretical throughput is 1-2 ms - and all I am timing is the write command - why do i see 9-10?

User_12 · January 5, 2011, 5:20pm

You don’t need to wait. Your application can have multiple threads that make use of the idle time in performing other tasks.
Did you try to send dummy data on the TCP socket? The slow speed you are encountering might be related to other issues in the application code.

[quote]However, it does not really answer the question of why this is so slow. The tcp overhead is pretty incidental on a 10 mbit connection. However, building the headers in managed code could be trouble - which of course would also apply to building the meta-packet in managed code. If the theoretical throughput is 1-2 ms - and all I am timing is the write command - why do i see 9-10?
[/quote]

You don’t need to go deep in the assumtion, but to answer your answer most of the TCP/IP stack is done in the native code. It is not only the overhead, there is the whole negotiaion protocl packets that go back anf forth with TCP communication. I suggest that you only need to take a look at the code and try to optimize it.
To build up the meta-packet, NETMF offers many methods to handle arrays. take a look at Array class. If the application is copying byte by byte from one buffer to buffer then of course it will be super slow.

Townsend · January 5, 2011, 5:34pm

Everything is done with threads. There is quite a collection of them actually. I am timing only the Write command of the NetworkStream class. My only point on the waiting comment is that I cannot always fill a packet. In many cases there is a one to one relationship between incoming and outgoing packets. In this case the meta-packet scheme breaks down completely, and we simply end up adding to total time. For example, a programming event may involve 8,000 to 10,000 messages - each requiring an answer. Adding 8 ms to each answer increases total time by over a minute. This becomes an issue. I know something about TCP. I’ll sniff the thing and see where the time seems to go.

Thanks.

Brett · January 5, 2011, 5:36pm

I think the poster means they can’t aggregate data since that can cause latency in itself before being put on the wire - they need the data at the other end quickly and can’t wait to fill a buffer of messages before starting that send.

(edit: I see that a response was posted between me reading the thread and finishing my reply)

William · January 5, 2011, 6:30pm

I am thinking you may need to sniff the wire also. I suspect something else is going on.

User_12 · January 5, 2011, 7:49pm

I got what you mean. These were my suggestion and of course you understand your application’s need more than I do. The tested throughput speed was measured with large tcp packets.

Townsend · January 6, 2011, 5:55pm

I am still working on this - but it gets curiouser and curiouser. Wireshark shows packets being produced every 8.2-8.6 ms. Regular as clockwork. Acks are returned in microseconds. Clearly the slowdown is not the result of the PC. However, when I accumulate time I am find my loop producing a message every 13ms. The accumulater is pretty basic - get DateTime.Now.Ticks and then grab it later and take the difference. Not sure where it goes wrong, but there is something odd going on because the shark is quite clear that messages are posting every 8ms. I included a sample of wiresharks output culled from a typical run. Still not sure how the 8 ms breaks down - but it is longer than usb and the c# is the same for both.


No.     Time        Source                Destination           Protocol Info
      1 0.000000    192.168.15.10         192.168.15.20         TCP      52707 > 11000 [ACK] Seq=0 Ack=0 Win=65520 Len=0
      2 0.008659    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=0 Ack=0 Win=10900 Len=130
      3 0.016862    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=130 Ack=0 Win=10900 Len=130
      4 0.016915    192.168.15.10         192.168.15.20         TCP      52707 > 11000 [ACK] Seq=0 Ack=260 Win=65260 Len=0
      5 0.024991    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=260 Ack=0 Win=10900 Len=130
      6 0.033200    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=390 Ack=0 Win=10900 Len=130
      7 0.033262    192.168.15.10         192.168.15.20         TCP      52707 > 11000 [ACK] Seq=0 Ack=520 Win=65000 Len=0
      8 0.041939    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=520 Ack=0 Win=10900 Len=130
      9 0.049933    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=650 Ack=0 Win=10900 Len=130
     10 0.049998    192.168.15.10         192.168.15.20         TCP      52707 > 11000 [ACK] Seq=0 Ack=780 Win=64740 Len=0
     11 0.058283    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=780 Ack=0 Win=10900 Len=130
     12 0.066500    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=910 Ack=0 Win=10900 Len=130
     13 0.066563    192.168.15.10         192.168.15.20         TCP      52707 > 11000 [ACK] Seq=0 Ack=1040 Win=64480 Len=0
     14 0.075110    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=1040 Ack=0 Win=10900 Len=130
     15 0.082957    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=1170 Ack=0 Win=10900 Len=130
     16 0.083019    192.168.15.10         192.168.15.20         TCP      52707 > 11000 [ACK] Seq=0 Ack=1300 Win=65520 Len=0
     17 0.091590    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=1300 Ack=0 Win=10900 Len=130
     18 0.099705    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=1430 Ack=0 Win=10900 Len=130
     19 0.099777    192.168.15.10         192.168.15.20         TCP      52707 > 11000 [ACK] Seq=0 Ack=1560 Win=65260 Len=0
     20 0.108014    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=1560 Ack=0 Win=10900 Len=130
     21 0.116569    192.168.15.20         192.168.15.10         TCP      11000 > 52707 [PSH, ACK] Seq=1690 Ack=0 Win=10900 Len=130

Townsend · January 6, 2011, 8:21pm

There is an explanation for the timing issues - the times vary widely. I happened to catch a section of the trace with nice regular 8ms times. I began segregating times based on a threshold. When the threshold was 12ms to send I found 893 events below this time for a total of 6485 ms (7.26 ms avg) and 2870 events above this threshold for a total of 46516 ms (16.21 ms avg). My next test will be to isolate the transport and remove the CAN operation. We should get some much harder numbers.

Townsend · January 11, 2011, 11:20am

Finally got back to this. I am timing the transmit of a 128 byte packet to the pc for usb, wired tcp, and wifi. The average is over 1,000 events. The numbers, in milliseconds are:

USB lo - 1.00 hi - 1.42 Avg 1.198
Wired tcp lo - 2.48 hi - 3.83 Avg 2.933
wifi tcp lo - 2.48 hi - 3.14 Avg 2.943

TCP’s best is only slightly better than twice USB’s worst. If I perform this test with other threads running (the CAN threads in particular) the difference is starker. Even given a 72 mhz processor this seems slower than I would have expected.

WouterH · January 11, 2011, 11:27am

I wonder what the timing would be when sending the packets as UDP instead of TCP.

User_12 · January 11, 2011, 12:07pm

I expect that you can more than 5 time faster if the packet you are sending over netwrok is 128x10.
Was the test done with simple application programs that handle each interface separately? In another word, What does your testing application do? Can I have the code?

User_12 · January 11, 2011, 12:15pm

I’ve just double check. the numbers you are getting make sense.
According to our tests, USB speed is up to 400KByte/sec.
TCP socket send speed is about 150KByte/sec.

So yes USB is faster

Mike · January 11, 2011, 12:34pm

USB?

William · January 11, 2011, 3:27pm

In a strange way, that would seem to make some sense.
The message goes to at least 3 layers. TCP, then IP, then Ethernet driver, then hardware - and many buffers along the way. Writing to USB is going almost right to the driver. TCP/IP stack on a chip would seem to be needed.