G400 freeze for more than 15 min

Always pass your array to RLP. Do not store any pointers. I have a feeling this will fix the problem.

I`m running next tests. I added extra thread which call RLP in loop and my first crash was after 12h. Now I have 2 boards running same code but with 10s watchdog so if any board crash and I will see that it takes only few seconds offline a know, that problem is related to RLP and fast solution for reducing damage is watchdog. Hopefully I will be able to send you parameters for your test that cause this problem in short period so we can find problem and solution.

I really appreciate your help!

Thanks

Did you try to not store any pointers in RLP? Pass your array/pointer in every call instead.

I will but I have to be sure that I have code that crash after 12h, not after 14days.
I prepare the following tests then:

  1. Store pointers
  2. Passing 2 parameters
  3. Passing 4 parameters
  4. Call RLP and return 0 without my serialization
  5. Don`t call RLP and move serialization to C#
    and moreā€¦

I just want to run as many tests as I can to collect all kind of data.

Next 6 boards is on the way, so next week I can run 8 parallel testsā€¦

1 Like

Test results:

  1. When I pass array pointers to RLP, code crash even faster
  2. When I use unsafe code and fixed key word for both arrays I pass to RLP, code crash anyway
  3. When I use my stopwatch class and log RLP.Serialize operations longer than 2ms, I can see thah sometimes it takes over 300ms (GC?)

Possibilities:

  1. GC shuffles my arrays when I am inside RLP. Sometimes RLP serializer is working fine. Sometimes GC do something different or at different stage of serialization and whole system crash. But when I call GC(true) manually after every serialization, code crash too.
  2. Is it possible that every time I invoke RLP method and pass some arguments, these arguments are added to some internal buffer and this buffer overrun?

I think nobody have this problems because nobody calls RLP and serialize message every few milliseconds. For example if you call RLP every 5s, it could take years before crash.

My chances:

  1. Use my C# serializer (but RLP is much much faster and I need to be super fast)
  2. Disable GC before calling RLP and enable GC after (Not possible right?)

Gus, what do you thing? What do you want me to try?

Thanks

When you call RLP, nothing else run in the system so the fact that it gets worse with you passing the array in every call doesnā€™t make sense at all.

Did you rule out everything else, like networking? Are we sure we are focused on the right thing? What about an app that just serialize a random data. Keep the program small and only use what if absolutely necessary.

@ xtomas22 - I think it is not possible to disable GC but in case you call the GC manually just before enterring the RLP then the chance that GC will be executed during RLP is very very small.
But I am even not sure that GC can start during RLP because Gus pointed out that nothing else runs during RLPā€¦

Ok I checked my code and my stopwatch measure method that do some C# code before calling RLP so thatā€™s why it is possible GC is running. I`m sorry, my mistake.

I made some changes in my tests and for the moment it looks like code without sending messages over ethernet is stable last 12h.

My threads:

  1. Thread1: calling RLP.Serialize in the loop (test - heavy load)
  2. Thread2: calling SPI.WriteRead in the loop (I/O as shift buffer)

My tests:

  1. Thread1 + Thread2 + ETH sending message every few ms using UDP [UNSTABLE - crash every 1-6 hours randomly]
  2. Thread1 + Thread2 + no ETH [STABLE 12h]
  3. Thread1 + no ETH + blinking output [STABLE 12h]
  4. Thread2 + ETH as logger, but only when test 3) stops blinking [STABLE 12h]

I looks like RLP alone is stable for the moment. When I use ETH + RLP together in 2 different threads in the loop, it is only matter of time until crash.
That`s actually reason I was confused, because when I added this test Thread1 with RLP, my code started to crash much faster.

I don`t know how to find problem, but I suppose you can write your own test code causing crashes using this information.

When I find something useful I post it here.

Thanks to all

Am I right in saying you are calling the same RLP code from 2 different threads?

Is your RLP code thread safe?

I use C# lock(obj) syntax inside my C# library, so it`s thread safe. Plus I have shared arrays for serialization and deserialization so it has to be locked. Plus when you call RLP, no other C# code is running until RLP operation is done.

Now I can confirm my last post, that all versions of code without ETH are stable.
Same code with ETH crashed 7x in last 12h.

@ xtomas22 - good. Now we need to narrow it down to a reasonably small example so there is nothing in it but Ethernet.

Ethernet only is stable, so it has to be combination of ETH + RLP

@ xtomas22 - I am sorry there is nothing else I can think of.

Hi, Gus,

I have new info about my problem. Actually all my boards crash after some time. After I rewrite serialization from RLP to C# I only delayed my problem so crashes occurred in longer time.
When I disconnect ETH cable so I can`t send any messages, everything works fine. It really looks like some kind of ETH driver problem that cause system crash and only watchdog help.

So let me ask you if you are planning to release new SDK 2017 or TinyClr OS with G400 support and when.

I volunteer for testing

If we dont find solution, I have to left GHI and G400 and try to find other solution but very quickly, because our customers are getting nervous. Im capable to hold situation under control maybe 2 month and I have only 2 month to redesign boards, which is nearly impossible.

I was desperate, so Iā€™m trying to write your own ETH driver, but it is not going very well. It`s based on Atmel examples.

So let me put some ideas on the table:

  1. What if I rewrite my communication from UDP to TCP, would it help?
  2. What if I disconnect and reconnect Build-in Ethernet Interface once a day?
  3. I have only 2 sockets in code, one for receiving and second for sending. I do not dispose this sockets I reuse them but it shouldnā€™t be problem right?
  4. I have my own EndPoint so I can reuse SocketAddress object and I initialize Address and Port before every Socket.SendTo call. I suppose it ok (viz image).

I can prepare some solution with minimal code for your, so you can run some tests.

Thanks for any idea

@ xtomas22 -

If you can provide us a simple project that we can reproduce the issue in our office we will try our best to find out.

The reason for a simple project because, if a complex project or hard to reproduce (once few days, weeksā€¦), it can be difficult to find the problem.
Second reason is, while rewriting from a big project down to smaller project, you may find out where issue is by self before us.

Plus, while I love the forum and the support Gus and Gary and the whole team give, thereā€™s nothing like actually speaking to them. As a hobbyist a support forum works, but when you have customers you really need something more formal.

SOLVED:

It was 2x lock(ā€¦) on critical code. One thread lock on first lock and than try to lock on second lock. At the same time other thread lock on second lock and try to lock on first lock, so all my C# code just freeze until watchdog reset board. So stupid bug. Occurrence was so rare that I couldnā€™t find it for 6 months.

I would also like to apologize to the GHI team that I suspect they have fatal error on G400. It was my fault of inattention. :-[

I appreciate all help from all of you, thank you!

3 Likes

@ xtomas22 -

Thanks for posting what the problem was AND how it was solved.

Always great to know what solved the issue :clap: