G120E Ethernet - long delays, buffer underrun error

Darren_SFI · November 30, 2017, 6:48pm

@John_Brochue - Hmm. I don’t know why you don’t see quite the same magnitude of delays. Two different people here have reproduced it (with FEZ Spider IIs and G120Es, our real application and the simple test program above), and of course our customer who first complained has also. But even a few seconds is bad, which you have seen right?

One of the things stopping me from figuring it out is that I don’t have access to what is going on at the lower levels. Is there any debugging that you can do on your end?

Or is there anything you can share with me that would help me to go further with my own debugging? I have the Porting Kit. Is there anything there that would help? Or would you share anything specific to the G120E?

Thanks!

John_Brochue · December 1, 2017, 1:44pm

A lot of this is very environment dependent. If you can narrow down in your environment what is causing this then we can try to reproduce that here. Ideally get it down to a single router just connected to the board and a PC that doesn’t have a lot of other network stuff running on it. Then see if the slow down can be isolated to specific cables, lengths, routers, or PCs.

Darren_SFI · December 7, 2017, 6:47pm

@John_Brochue - I have done that. My usual setup for testing the delay (from which I plotted the data above) is my computer and a G120E/Ethernet J11 connected to a Linksys router with nothing else attached to it. I have also connected the G120E directly to my PC’s Ethernet port, i.e. with no switch, and to a building network with a Cisco 40-port managed switch.

Another software developer here, our technicians in our laboratory, and our customer at their facility have done their own tests in similarly diverse conditions.

I have narrowed it down (as far as I am able with the information available to me in my managed application + Wireshark) to the buffer underrun error, as indicated in the Tx_StatusInfo register. When this occurs, the frame has to be retransmitted, causing the delay in the PC receiving a response to its request. It may take multiple attempts, with each one waiting longer than the previous attempt (2x ?), which is why the delay time varies and can grow rather large.

John_Brochue · December 11, 2017, 3:35pm

Can you send the Wireshark capture to support at ghi…, reference my post here so it gets routed to me.

To try and narrow down the issue a bit more, can you try this with the ENC28 Ethernet? This should tell us whether the issue is likely in the TCP stack or the Ethernet driver.

To confirm, you’re not actually losing data, it just occasionally takes a long time to arrive?

Darren_SFI · December 11, 2017, 4:30pm

@John_Brochue - Thanks, I will do as you suggested. You are correct - we do not lose data, but it occasionally takes a long time to arrive.

I do not have an ENC28 module. Is there any way I can still buy one (or a few of them)?

I have a couple Wifi RS21 modules, which are also SPI. I will also try that and let you know.

Thanks again for your help.

John_Brochue · December 11, 2017, 4:56pm

The RS21 should be a good test too, all three interfaces use the same TCP stack.

valon_hoti_gmail_com · December 11, 2017, 6:34pm

Just a question

When you connect ethernet to pc (direct)
Did you are used cross over or nothing just usual cable for ethernet

Since autotranslate (or reverse can cause delay if not used cross cable)

Darren_SFI · December 11, 2017, 10:04pm

@John_Brochue - The WiFi RS21 module was MUCH FASTER. I did not have the delay problems. I did 10,000 trials again. The longest response time was 147 ms. The average was 19 ms.

@valon_hoti_gmail_com - I used a regular patch cable for the direct-to-PC test. The adapter on my PC is auto MDI-X.

Darren_SFI · December 11, 2017, 10:49pm

@John_Brochue - I emailed the Wireshark capture to support. Let me know if it doesn’t find it’s way to you.

When you look at it, the iterations with the longest delays are:

1.) Request 8954 - 11,871 ms
2.) Request 1 - 8,552 ms
3.) Request 7668 - 7,966 ms
4.) Request 6105 - 7,887 ms
5.) Request 6143 - 7,877 ms

valon_hoti_gmail_com · December 11, 2017, 11:20pm

that i asked i have problem similiar to you

when i connect to router-switch there no problem
when i connect to pc directly “as you say with auto MDI-X i faced or non working sometime or big delays tested with Win7 and Win10 Laptop DELL Latitude e6220 i7” and latter i tried with crossover cable(exact adapter)

and it was worked as well as to connect to router-switch response request was well responsive

test done

Arduino - ENC28J60 Arduino UIPethernet library
Nucleo STM32F411RET6 - ENC28J60 Arduino UIPethernet library, mbed test and Library / ,NET Microframework miP library

John_Brochue · December 12, 2017, 1:54pm

I’ve received the trace.

The WiFi module working as expected is good news, it narrows the problem down to the Ethernet driver instead of the lwip TCP/IP stack. So that’s a smaller area for us to investigate. One last thing for you to test on your end, if you have a G400D or EMX, can you rerun the test with the physical Ethernet on those two boards?

Darren_SFI · December 12, 2017, 4:59pm

@John_Brochue - The beta version of our product from early 2015 used the original FEZ Spider, so I was able to test using the EthernetBuiltIn on EMX. Like the WiFi module, it is FASTER (average 20 ms, stdev 4.5 ms, maximum 458 ms). I updated the Spider I to the 2016 R1 SDK (4.3.8.1), so it was the same firmware version as my G120E tests.

I don’t have a G400D.

John_Brochue · December 12, 2017, 5:49pm

Thanks for the information. Since we were able to reproduce it, just on a longer time scale, we will take a look into it. Given the infrequent nature of the delays though, it’ll likely prove difficult to track down. We’ll let you know when and if we have any update.

Darren_SFI · December 13, 2017, 5:11pm

@John_Brochue - I sent you an email last night with a little more information from further testing I did. This time I printed out the contents of the data buffers for the three transmit descriptors before & after sending the response, and during the long delays. The bytes in these buffers can be matched with the bytes of the frames captured in Wireshark.

The LPC1788 can trigger an interrupt when the buffer underrun occurs. I think the IntEnable register is already set to do this. Maybe you can run repeated queries like I have been doing, and debug the device driver when the interrupt is triggered?

There are three reasons the LPC1788 would report an underrun (listed below). I don’t think it is #1, because it looks like the frames are being sent all as a single fragment, and there is no “NoDescriptor” error being shown. You may be able to see whether it is a fatal or non-fatal error, which would tell us if it is #2 or #3. Does the driver have to soft reset the Ethernet hardware by setting the TxReset bit in the command register? It seems most likely to me that it is #2, because the frame is simply re-sent.

Three causes of an underrun are:

(1) The next fragment in a multi-fragment transmission is not available. This is a nonfatal
error. A NoDescriptor status will be returned on the previous fragment and the TxError
bit in IntStatus will be set.

(2) The transmission fragment data is not available when the Ethernet block has already
started sending the frame. This is a nonfatal error. An Underrun status will be returned
on transfer and the TxError bit in IntStatus will be set.

(3) The flow of transmission statuses stalls and a new status has to be written while a
previous status still waits to be transferred across the memory interface. This is a fatal
error which can only be resolved by a soft reset of the hardware.

The first and second situations are nonfatal and the device driver has to re-send the frame
or have upper software layers re-send the frame. In the third case the hardware is in an
undefined state and needs to be soft reset by setting the TxReset bit in the Command
register.

John_Brochue · December 14, 2017, 2:05pm

Thanks for the additional information, I did receive the email. So far I’d lean on it being #2, but we’ll get a better idea when we take a deeper look into it.

przemo · February 13, 2018, 9:02pm

Dear Darren
Could you please advise if you solved this issue somehow? I observe same problem Remove socket's supply with c# code - #13 by przemo

PBgrammer · February 15, 2018, 10:58pm

@John_Brochue Have you made any progress on this issue? I am following up for @Darren_SFI.

John_Brochue · February 16, 2018, 1:37pm

No update yet unfortunately.

PBgrammer · March 22, 2018, 4:56pm

OK, it has been 4 months of radio silence since this issue was raised and confirmed. Does GHI have any plans to address this problem with the G120E EthernetBuiltin function, or should we be implementing a different solution?

Gus_Issa · March 23, 2018, 2:55am

Unfortunately, we have to balance between TinyCLR and netmf. It is difficult to put resources into something that is being replaced.