We are in the final stage of testing a system with a SCM-20260E-B using a wired ethernet connection.
The system connects using the MQTT client to post data every 200 to 500 ms.
After 5 or 6 hours the MQTT client shows some errors sending the updates to the MQTT broker.
I usually try a maximum of 3 times before closing the connection and connecting again.
Then after this happens a couple of times the board can’t connect to the server anymore, when trying to open the MQTT connection gives an error right away stating that the GetHostEntry has an exception.
I also try to ping the board and it doesn’t respond, it looks like the network interface is down, however, I’m reporting the link status and IP using a debug writeline and everything looks up and ok, however, still no response.
Only after a reboot, everything is back to normal.
The board is using DHCP to get the IP.
Does anyone have a similar problem?
Any best practices of how to handle these problems without losing the network?
Is there any other flags that I should be looking to detect that there is a network problem in the board?
I will test again with an static IP but it seems like this should work normally.
I need to make this as reliable an possible also these random disconnections are very difficult to debug.
@LucaP I’m using the onboard one. I can’t ping it from any other computer, but it is still running. I have TinyClr connected and writing messages using the debugger.
So, the device is running the latest firmware, the code seems to run normally, you can for example continue to blink an led in a thread but the network seems to stop working that even network ping will fail.
Is it possible that you can run your test software on our dev board? Any test software that you can share with us?
If we use loops without sleep increasing the CPU load it happens more often, also using queues that fill up quickly consuming free memory faster makes the network issue appear.
Having static IP makes no difference.
I have the extended memory heap enabled (so I can use big queues).
I will try to isolate an example code, it just take a lot of time since I’m using a lot of features DAC, ADC, SPI, CAN, etc. every iteration takes around a working day for it to lock.
I need to verify that it doesn’t disappears when other feature is removed (or makes it longer to show)
Thinking about memory exhaustion, I force a GC clean up when the memory goes below 3MB (10%)
and it doesn’t lock, however, the MQTT client still doesn’t send data on random gaps of ~20 mins, but the status is still connected.
Ping still responds and the other TCP connections seem to be still alive.
Check the MQTT broker connections in the periods where the data is missing shows no active connections.
I’m decreasing the KeepAlive time to 15 secs (instead of the default 60) to check if there is a relation.
The connection has been running for 3 days without locking up.
It has something to do with the GC and low memory while sending TCP data because is not locking up again after forcing a GC when below 10% free memory.
Still, the MQTT client is not communicating with the broker from time to time for about 20 mins, however, the other TCP sockets are still good.
I’m unsure if it is the broker (using RabittMQ) or the client.