I’ve been having trouble with moderately-sized file downloads on my Hydra + ENC28 platform. I’m running the 2013 R2 firmware but this issue has been present since I started with 4.1.
I generally don’t have trouble with XML and JSON content using the System.Net.HttpWebRequest class. My response streams are only a few KB for the XML and JSON use cases. The issue shows up intermittently with response streams 20K and larger.
Example Symptom:
Bytes Read Now: 4096 Total: 4096
Bytes Read Now: 1970 Total: 6066
#### Exception System.Net.Sockets.SocketException - CLR_E_FAIL (1) ####
#### Message:
#### Microsoft.SPOT.Net.SocketNative::recv [IP: 0000] ####
#### System.Net.Sockets.Socket::Receive [IP: 0018] ####
#### System.Net.Sockets.NetworkStream::Read [IP: 0062] ####
#### System.Net.InputNetworkStreamWrapper::ReadInternal [IP: 00d7] ####
#### System.Net.InputNetworkStreamWrapper::Read [IP: 000d] ####
#### HttpClientSample.MyHttpClient::PrintHttpData [IP: 00a3] ####
#### HttpClientSample.MyHttpClient::Main [IP: 0039] ####
#### SocketException ErrorCode = 10060
#### SocketException ErrorCode = 10060
A first chance exception of type 'System.Net.Sockets.SocketException' occurred in Microsoft.SPOT.Net.dll
#### SocketException ErrorCode = 10060
An unhandled exception of type 'System.Net.Sockets.SocketException' occurred in Microsoft.SPOT.Net.dll
I have reproduced this symptom using the Microsoft NETMF example found in \My Documents\Microsoft .NET Micro Framework 4.3\Samples\HttpClient.
Is this a generic issue with the NETMF libraries? Unique to the Hydra+ENC28 combination or perhaps related to my environment and boards? How to discern? Can any fool build the NETMF libraries from scratch to debug further?
To reproduce in your environment:
Take the Microsoft HttpClient example and open in Visual Studio 2008 or VS 2010. On my computer this was found in C:\Users\foo\Documents\Microsoft .NET Micro Framework 4.3\Samples\HttpClient
Edit the properties for the HttpClient Project and set the Target Framework to “.NET Micro Framework 4.2”. Associate with your connected GHI Gadgeteer kit.
Edit HttpClient.cs and add a line that calls PrintHttpData() with the following URL:
htttp://netmf.codeplex.com/downloads/get/500745
Any server but the Codeshare URL with download illustrates the issue.
In my testing I found that local servers (low latency) perform much better. Far-away servers (high latency) increase the probability of an error. Packet tracing revealed the ENC28 seems (my speculation) overrun with TCP ACKs, which it ignores.
In addition to this CLR_E_FAIL failure mode, we generally observe sub-par TCP performance on non-local HTTP servers. Probably related.
Thanks Andre for taking a look. I should have clarified my issue is the TCP socket always times out, when 20K or larger streams are encountered, irrespective of the timeout.
The timeout is adjustable as you know. Turn it up to 50 seconds (or 500) and you’ll get the same result. e.g. in HttpClient.cs find DownloadHttpData() and the line that says respStream.ReadTimeout = 5000. Change it to respStream.ReadTimeout = 50000 for a 50 second timeout.
I will pull in the core source as you suggested. I expect that is where the TCP state machine is? My hunch is that it doesn’t get the TCP ACK frame from the ENC28 chip but I may need to hook up the logic analyzer to prove/disprove that.
JayJay thanks for taking a look. What assembly reference do I need to add for that to work? System.IO.Stream does not contain a definition for ‘SetSocketOption’.
In this example, the System.Net.HttpWebRequest class is being used as a client for a web server. We aren’t listening for or accepting any connections. This example also doesn’t bind our own Socket. This is probably implemented in HttpWebRequest.
Do you have the SDK installed? If so, use Windows Explorer and browse to the .NET Micro Framework samples directory and double-click the HttpClient.sln
Incidentally, the response timeout setting does appear to work in my case.
Just curious but can one use async socket IO on .Net MF? If so then it might be interesting to write a test program using async reads to see if that works, my reasoning being that by definition theres no such thing a timeout in async so it might run through a different code path, possibly bypassing the code causing your issue .
I’ve pulled down the Porting Kit (PK) source and debugged as far as the Native Socket implementation.
It seems my NETMF environment goes into a persistent error condition after encountering a timeout on an HttpWebRequest.GetResponse().GetResponseStream().Read()
Subsequent calls (for different urls) to HttpWebRequest.GetResponse() seem to block in Socket.Connect() or Microsoft.SPOT.Net.SocketNative.poll().
What is my next step to debug? Has anyone run across this stack trace (see screen shot) before? It’s pretty easy to reproduce with the HttpClient sample application from the 4.2 NETMF SDK.
The key to reproduction is talking to servers on the Internet, not your LAN. When TCP segments are dropped and retransmission is needed, the NETMF TCP stack seems prone to lockup. I’ve got packet sniffer output where TCP segments are retransmitted but lost leading to the Stream read() timeout. The error recovery doesn’t seem robust.
If anyone has the native (C++) build environment up and running, I’d be very happy to provide you a reproducible test case.
Anybody with a Hydra, Cerberus or Cerbuino out there who is using the ENC28?
If so, would you see if this LWIP TCP lockup issue reproduces on your bench? I am planning to pay GHI and/or Microsoft to fix this, but I need to see if the issue is reproducible first.
yes I do see wired error message when i’m debugging using MFDeploy… something to do with a timeout…until it locks up…
try Debuggin with MFDeploy instead of VS it will give you more info…