Encoding code optimization

Alagrat · January 29, 2011, 10:30am

Hello, I need to convert byte array to CP866 string on FEZ Domino. I use the following code, but it’s very slow…
Can anybody suggest any solution?



using System.Collections;

namespace Enc
{
    class Encoding866
    {
        //symbols array
        private static readonly ArrayList CharMap = new ArrayList(){
                                               'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О',
                                               'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э',
                                               'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м',
                                               'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы',
                                               'ь', 'э', 'ю', 'я', 'Ё', 'ё', 'Є', 'є', 'Ї', 'ї', 'Ў', 'ў','І','№','ÿ',
                                               128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
                                               143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157,
                                               158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172,
                                               173, 174, 175, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235,
                                               236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 73, 252, 255
                                               };
        //array length
        private static readonly int ArrLength = CharMap.Count / 2;
        //temp count var
        private static int _iPos;

        //get symbol code 
        private static int GetCharCode(char charIn)
        {
            _iPos = CharMap.IndexOf(charIn);
            return _iPos > -1 ? (int)CharMap[ArrLength + _iPos] : (byte)charIn;
        }

        //get symbol 
        private static string GetChar(int codeIn)
        {
            _iPos = CharMap.IndexOf(codeIn);
            return _iPos > -1 ? (CharMap[_iPos - ArrLength]).ToString() : ((char)codeIn).ToString();
        }

        //convert input string to bytes
        public static byte[] GetBytes(string strInput)
        {
            var inCharData = strInput.ToCharArray();
            var outBytes = new byte[inCharData.Length];

            for (int i = 0; i < inCharData.Length; i++)
            {
                outBytes[i] = (byte)GetCharCode(inCharData[i]);
            }
            _iPos = 0;
            return outBytes;
        }

        //convert input bytes to string
        public static string GetString(byte[] byteInput)
        {
            var outString = string.Empty;
            for (int i = 0; i < byteInput.Length; i++)
            {
                string strChar = GetChar(byteInput[i]);
                if (strChar == "ÿ") strChar = string.Empty;
                outString += strChar;

            }
            _iPos = 0;
            return outString;
        }
    }
}

Skewworks · January 29, 2011, 12:34pm

Not a lot of opportunity for optimization there…that said

While is faster than for.
String.concat is faster than +=

Alagrat · January 29, 2011, 1:13pm

Yes… and also, changing function to

//get symbol 
        private static string GetChar(int codeIn)
        {
            if (codeIn > 127)
            {
                _iPos = CharMap.IndexOf(codeIn);
                if (_iPos > -1) return (CharMap[_iPos - ArrLength]).ToString();
            }
            return ((char) codeIn).ToString();
        }

William · January 29, 2011, 5:10pm

“While is faster than for.”

Don’t while and for both compile down to pretty much the same goto loop?

WouterH · January 29, 2011, 5:44pm

foreach should run faster in this case.


int i = 0;
foreach (char c in inCharData)
   outBytes[i++] = GetCharCode(c);

Also, why does GetChar return a string instead of char? And GetCharCode an int instead of byte?

WouterH · January 29, 2011, 5:57pm

Why include those numbers inside the ArrayList. They all look sequential. So you could get the number like this:


private static byte GetCharCode(char charIn)
{
   int _iPos = CharMap.IndexOf(charIn);
   return _iPos > -1 ? (byte)(_iPos + 128) : (byte)charIn;
}

And declare _iPos inside the method. Getting a value from stack goes faster then looking up a member variable.

William · January 29, 2011, 6:48pm

Did not test, but this should be faster. Use byte instead of ints in your lookups. Return char instead of all the string casting. Build a char[] and use new string from char[]. I think will same some casting and space which should both increase speed.

public static char GetChar(byte code)
{
    return 'A'; // char lookup.
}

public static string GetString(byte[] byteInput)
{
    char[] ca = new char[byteInput.Length];
    int index = 0;
    while (index++ < byteInput.Length)
    {
        ca[index] = GetChar(byteInput[index]);
    }
    return new string(ca);
}

WouterH · January 30, 2011, 12:19pm

still, using a foreach to walk through the byteInput should be faster then looking up each value with byteInput[index]

William · January 30, 2011, 12:41pm

“still, using a foreach to walk through the byteInput should be faster then looking up each value with byteInput[index]”

@ wouter. How so? Typically, foreach is slower then while or for because of the enumerator (i.e. Current, MoveNext) object overhead. Maybe I am not understanding what perf factor your pointing to.

WouterH · January 30, 2011, 1:03pm

well I’ve ran a test, and foreach seems a very little bit slower.

I thought foreach on a byte array was implemented like an incremented pointer as we know from C/C++. In C/C++ the incrementing pointer is faster than looking up a table index since it needs fewer assembler instructions (no need for offset calculation).

It seems now that this is not how foreach is implemented for a byte-array.

I also tested different scenario’s on the Encoding code posted here, and I must say there is not much room for optimalisation. Even removing the byte-array allocation doesn’t speed things up…

In general the code is slow and I think there is a need to support this kind of conversions in native code.

Alagrat · January 31, 2011, 2:14am

Thanks for all answered. I’ll try suggestions and post the result

William · January 31, 2011, 2:43am

“I’ve ran a test, and foreach seems a very little bit slower.”

Sounds good. Foreach is a great construct, so people should not shy away from it in general. In most common usage, it will not matter and your efforts would be better spent somewhere else.