UTF8 / Comms Issue - Data Mismatch

Hi Guys,

I am faced by a rather odd problem, and I am at the verge of setting fire to my Panda board. Been up all night trying to fix it.

Two Fez Pandas linked by a fully functional radio link.

static byte[] message = UTF8Encoding.UTF8.GetBytes("9999999999");

Being sent continuously returns bytes with values 202 on the other side (a 9 is a 57). 87s return 230s, etc.

Whilst:

static byte[] message = new byte[5]{57,57,57,57,57};

Returns 57 as expected on the other side.

Why oh why does assigning the byte value from UTF encode somehow change the value of the byte during transmission, and manual assignment works a charm?

The radio link has been previously used with a PICAXE chip, and worked a treat - absolutely no problems.

Any ideas? This seems rather strange… Is there anything special about the inner workings of UTF8ENCODE that may somehow cause the data to be distorted in transmission?


Edit: Also, tried different radio modules. Same exact problem persists. Data assigned manually is sent through undistorted. Data assigned through UTF8 is shifted along by around 200 each time. Furthermore tried a different board - A FEZ Panda 1 instead of FEZ Panda 2 - Same problem persists. Firmware up to date.


More Info: When debugging, and checking the value of the message byte after assigning with with UTF8.GetBytes the values do indeed appear correct, e.g. 57 in aforementioned case.


Currently bypassing the issue by making my own encoding class - Also manages bit balancing etc, but I’d love to know why the UTF8 Encoding class acts up.

sounds like a mismatch of the serial parameters between the transmitter and receiver. Are both/all sides 8-Bits, no parity, one stop bit?

The above statement says to me it is not a UTF8 issue.

Its currently being tested in a feedback loop. TX and RX on the same port of the same FEZ Panda.

And if I manually assign all the bytes, the data comes across perfectly.

I wish I had a digital scope to see what actually comes out of the serial pin when UTF8 is used.

A crude bit of code that I’m currently using for testing:


using System;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
using System.Threading;
using System.Text;

using System.IO.Ports;

using GHIElectronics.NETMF.FEZ;

namespace MFConsoleApplication1
{
    public class Program
    {
        static SerialPort radio = new SerialPort("COM1", 500);
        static TristatePort transmitEnable = new TristatePort((Cpu.Pin)FEZ_Pin.Digital.Di4, false, false, Port.ResistorMode.Disabled);
        static byte[] preq = new byte[5]{57,57,57,57,57};
        // static byte[] preq = UTF8Encoding.UTF8.GetBytes("9999999999999999999999");
        
        

        public static void Main()
        {
            radio.DataReceived += new SerialDataReceivedEventHandler(DataReceivedHandler);

            Debug.EnableGCMessages(false);

            radio.Open();
            Debug.Print("Open");

            while (true)
            {
                DrivePin(transmitEnable);

                Debug.Print("Transmitting");
                DrivePin(transmitEnable);
                for (int i = 0; i < 100; i++)
                {
                    radio.Write(preq, 0, preq.Length);
                }
                FloatPin(transmitEnable);
                Thread.Sleep(5000);
            }
        }

        private static void DataReceivedHandler(object sender, SerialDataReceivedEventArgs e)
        {
            byte[] rx_data = new byte[1], 

            radio.Read(rx_data, 0, 1);

            if (rx_data[0] == 57)
            {
                Debug.Print("57 Received");
            }
        }

        static void DrivePin(TristatePort port)
        {
            if (port.Active == false)
                port.Active = true;
        }


        static void FloatPin(TristatePort port)
        {
            if (port.Active == true)
                port.Active = false;
        }


    }
}


The posted version of the code works a charm. If I comment out the 57,57,57,57 line and uncomment the UTF8getbyte(“99999999” ) line, it will stop working, and upon debug claim it is receiving 202on the other side.

You are assuming that each time you get a DataReceived event there is only one byte available. This is an incorrect assumption. There is a property of the serial port that will tell you how many bytes are available to read.

Also, you are not setting the serial port parameters. Do you know what the defaults are?

Debug.Print the values of preq[i] and see if what is going out is what you expect.

Can you eliminate the actual radio and loop back the pins directly to see if it comes back clean?

Mike - Is going through it byte by byte a bad approach? What I’m doing in my actual live version of the project, is going byte by byte until it reaches the start byte, then it checks the next 3 bytes to see whether they match the qualifier, the next byte has packet length in it, it reads that many, then two more for the CRC - Is there a better/neater way of doing it?

I’ve resolved to writing my own encoding class, which works perfectly, so I recon I’ll just stick with that :slight_smile: It does bit balancing whilst at it, so its all good.

It is a good idea to read all the bytes in the internal buffer during an event. You could read a byte to get a length and then read the rest of the internal buffer. If you do not empty the internal buffer during an event callback, you will eventually lose data when the buffer fills up.

Usually forthe processing of a message, as you described is done by a state machine design. I build a message assembly class. Each time I get a DataReceived event, I read all the available data and pass it to the assembly class. The assembly class is a state machine, which knows what is current part of the message assembly process(waiting for start, read id, reading length, etc). When an entire message is present, the assembly class fires a message ready event. You also have to be careful to consider the condition where the DataReceived event passes a block of bytes to the assembly object which contains the end of a message and the start of another.

The above discusses how to process a message, without losing data, but in no way explains how you would be getting data corruption. Serial parameter mismatch is a better attribution.

Thanks for that info Mike - The data I’m processing is relatively infrequent.
The byte by byte processing only filters out background noise - actual messages do indeed get forwarded straight to a depacketizing class.

The corruption is no longer a problem, I’ve programmed my own encoding class, which works just fine. Just a bit weird that it didn’t work with UTF8. When I have a bit more time and less deadlines I’ll look into it.

You should not decode UTF8 byte by byte. For chars or byte values > 127 you may/will have 2 bytes representing such a value.

If you can, you may try to add accentuated chars and see your byte array before sending it. You can try to encode “azerty”, which will give you a 6 bytes array, then encode “azertyéà” which will give you a 10 bytes array…

This is why I code Start/End chars in my strings when transfering data accross networtk or serial line. And I choose chars with low value, like ‘#’ and ‘$^’ (ASCII 35 & 36), which I know won’t end up giving 2 bytes.

So, when I receive an UTF8 string, I know the string is complete when I have received the end char (’$’ in my example). Then, and only then, I can parse this string.

Of course, I can do this only because I can master both ends : server and client. If one is not under my control and there’s no start/end char, then it’s obviously more difficult to parse the received string.