Bad performance with HashTable

leforban · March 30, 2012, 11:50am

Hi everyone. I am facing some troubles with my application. I am using the current version of Hashtable.cs available on codeplex but I was also experiencing the same problem with the one provided in the System.Collections namespace.

I have stored data of several types (float, int short, ushort…) into a Hashtable called HTV.
Then I need to compute substraction addition of item of the HTV and store the result in other item of the HTV.

Each object stored in the HTV is define with its own type.
as an example a bool item of HTV is declared as:

   public class val_HtV_bool:val_HTV
    {

        public new bool value;
        public val_HtV_bool()
        {
        }
    }

whereval_HTV is defined as:

public class val_HTV
    {
        public object value;
        public val_HTV()
        {
        }
    }

for substraction my code is:

 if(HTV[c1] is val_HtV_float)
{
            ((val_HtV_float)HTV[channel]).value = ((val_HtV_float)HTV[c1]).value - ((val_HtV_float)HTV[c2]).value;
}
else if(HTV[c1] is val_HtV_short)
{
             ((val_HtV_short)HTV[channel]).value = (short)(((val_HtV_short)HTV[c1]).value - ((val_HtV_short)HTV[c2]).value);
}
else if(HTV[c1] is val_HtV_ushort)
{
         ((val_HtV_short)HTV[channel]).value = (short)(((val_HtV_ushort)HTV[c1]).value - ((val_HtV_ushort)HTV[c2]).value);
}

Unfortunately I observe very bad performance (3.7ms per substraction), I have try during all this day to improve this without any success. (HTV is about 100 of buckets) Do you have any advice? I believed that encapsulating the value into a class would avoid boxing unboxing but may be I am wrong.

Any help would be appreciated

jasdev · March 30, 2012, 5:12pm

@ leforban
Did you see this posting:

http://www.tinyclr.com/forum/1/6528/

It could be that you are repeatedly setting the same hashtable entry, and the hastable is growing.

Architect · March 30, 2012, 5:14pm

LOL he is the author of that thread. ;D

jasdev · March 30, 2012, 5:19pm

Oh. :-[

MoonDragon · March 30, 2012, 5:59pm

May I ask, why are you keeping different numeric types in the same hashtable? Would it not be easier to convert all of them to floats, or doubles and then do math on simple known value-typed variables?

faybs · March 31, 2012, 5:53am

I admit to not being too familiar with the micro CLR, but the following should be faster than what you have. It avoids boxing and the use of reflection (the ‘is’ operator), and is a struct rather than a class, meaning that it doesn’t live on the heap, but rather inside objects as any value type would; this means that the garbage collector is not doing a bunch of busy work wrangling all those tiny objects. On the downside, it does use more code and data space than using a boxed object would.


    class Program
    {
        static void Main(string[] args)
        {
            val_HTV value1 = 10;
            val_HTV value2 = 20.5;

            value1.Subtract(value2);

            Console.WriteLine((float)value1);
        }
    }

    public struct val_HTV
    {
        private enum Kind
        {
            Int,
            Float
        }
        private Kind kind;
        private float floatValue;
        private int intValue;

        public static explicit operator float(val_HTV value)
        {
            if (value.kind == Kind.Float)
            {
                return value.floatValue;
            }
            else
            {
                return (float)value.intValue;
            }
        }

        public static explicit operator int(val_HTV value)
        {
            if (value.kind == Kind.Int)
            {
                return value.intValue;
            }
            else
            {
                return (int)value.floatValue;
            }
        }

        public static implicit operator val_HTV(int value)
        {
            val_HTV retval = new val_HTV();

            retval.kind = Kind.Int;
            retval.intValue = value;

            return retval;
        }

        public static implicit operator val_HTV(float value)
        {
            val_HTV retval = new val_HTV();

            retval.kind = Kind.Float;
            retval.floatValue = value;

            return retval;
        }

        public static implicit operator val_HTV(double value)
        {
            val_HTV retval = new val_HTV();

            retval.kind = Kind.Float;
            retval.floatValue = (float)value;

            return retval;
        }

        public void Subtract(val_HTV other)
        {
            if (this.kind == Kind.Int && other.kind == Kind.Int)
            {
                this.intValue -= other.intValue;
            }
            else if (this.kind == Kind.Int && other.kind == Kind.Float)
            {
                this.kind = Kind.Float;
                this.floatValue = this.intValue - other.floatValue;
            }
            else if (this.kind == Kind.Float && other.kind == Kind.Int)
            {
                this.floatValue -= other.intValue;
            }
            else // (this.kind == Kind.Float && other.kind == Kind.Float)
            {
                this.floatValue -= other.floatValue;
            }
        }
    }

leforban · April 2, 2012, 9:05am

I have implemented your solution this morning but did not observed enhancement in speed and the data struct is huge (I have 14 types of variables to handle from short to ulong, passing by string, sbyte…)

I am still thinking that I do not access to item of hashtable by reference but rather by value (involving copy…) this is probably the cause of bad performance.

leforban · April 2, 2012, 10:00am

How to avoid boxing and unboxing using collections?

Doing just:
HTV[key1] = HTV[key2];

Takes 2ms!!! My idea was just to have a single table to manage all the data of my application. This is why HashTable of variable has been chosen but actually I think that this design choice may not be the correct one.

MoonDragon · April 2, 2012, 11:09am

Simple answer? Don’t box. Convert all your values to floats/doubles/longs and just roll with one value-type. Every time you rely on the system to have smarts for you, you will pay dearly in performance. This is especially true for NETMF. I’ve done a bit of speed testing with FEZ Spider and posted a post-mortem on my blog:

The solution, at the end of the day, was RLP. If you need to move and process any “significant” amount of data in a “reasonable” amount of time, you should switch to RLP. Unfortunately, RLP is C/C++ code, so you lose all the niceness of C#.

leforban · April 2, 2012, 12:56pm

Replacing the hashtable with a float[] of size >1000 and managing the type separately shows same performance… I am really surprised by these poor results.

faybs · April 2, 2012, 7:07pm

Interesting. That means that the poor perf is not caused by boxing or the hashtable, otherwise getting rid of one or the other should have made your code faster. I’m also surprised that simply inserting into an array takes 2ms. I mean, this is a non-JIT’ed CLR on a slow and relatively primitive CPU (I don’t think any CPU currently used for netmf does superscalar or out of order execution), but that’s just a bit much Are you completely sure you’re measuring what you think you’re measuring?

leforban · April 3, 2012, 3:55am

In fact I succeeded to decrease the runtime of the function from 75ms to 22ms. There’s 32 elements in list_of_comp_data.Values (and therefore 32 iteration of the loop).

So the following code takes 22ms for 32 elements in the HashTable list_of_comp_data:

foreach(Comp_Data c in list_of_comp_data.Values)
            {
                switch(c.list_of_param[0])
                {
                    case 0:
                        {
                        } break;
                    case 5: // This is an initial value
                        {//raison2
                            if(c.init == true)
                            {
                                c.init = false;
                                HTV[c.channel] = HTV[c.list_of_param[1]];
                            }
                        } break;
                    case 4:  //Priority management //todo
                       { 
                            HTV[c.channel] = HTV[c.list_of_param[1]];
                        } break;
                    case 3: //substraction
                        {
                            HTV[c.channel] = HTV[c.list_of_param[1]] - HTV[c.list_of_param[2]];

                        } break;
                    case 2: //Addition
                        {
                            HTV[c.channel] = HTV[c.list_of_param[1]] + HTV[c.list_of_param[2]];
                        } break;
                }

According to me, 22ms is still slow. The last HashTable is list_of_comp_data, probably that if I remove this one, the loop would be perform faster.

leforban · April 3, 2012, 4:36am

After modifying the HashTable list_of_computed_data into a

Comp_Data[] list_of_comp_data = new Comp_Data[100];

The code is executed in 7.5ms!!!

faybs · April 4, 2012, 12:04am

Wow. I knew that netmf was not going to be a speed demon, but these numbers are just not very encouraging. 7.5ms to insert 32 elements into a fixed-size array… I think my Commodore 64 running BASIC ran faster than that

What board are you using? Hopefully it’s one of the slower ones.

kurutoga · April 4, 2012, 12:32am

This does not make any sense. Are you sure the unit is correct that you are using milliseconds, (1/1000 of a second) not microseconds (1/1000000 of a second)?

Saving int32 data into a static array takes roughly 0.01 ms, or 10 microsecond, on the Hydra board I just received. See my results from this thread.
http://www.tinyclr.com/forum/21/6546/#/2/
Hopefully I was not too wrong.

Even if I was correct, it is a bit shocking (already) to see a 200MHz CPU taking 2000 cycles to access one address in RAM, even knowing all that address indirections involved in .Net runtime. Maybe the bottleneck is somewhere else? RAM is too slow, or the memory bus is too slow, or the cache is too small?

leforban · April 4, 2012, 3:43am

I am using EMX module.

Actually perf is mesured through


start2 = DateTime.Now;
c_datas.update(ref Playsets.HTV);
end2 = DateTime.Now;
Debug.Print("Computed data in," + (end2 - start2).Ticks / 10 + ", micro seconds");

The update function is:

        
public void update(ref float[] HTV)
        {
            for(int i=0;i<nb_of_comp_data;i++)
            {
                //c.Update(ref HTV);
                Comp_Data c= list_of_comp_data[i];
                switch(c.list_of_param[0])
                {
                    case 0:
                        {
                        } break;
                    case 5:// alors c'est une valeur initiale
                        {
                            if(c.init == true)
                            {
                                c.init = false;
                                HTV[c.channel] = HTV[c.list_of_param[1]];
                            }
                        } break;
                    case 4:  //alors c'est une priorité 
                        { 
                            HTV[c.channel] = HTV[c.list_of_param[1]];
                        } break;
                    case 3:
                        {
                            //alors soustraction
                            HTV[c.channel] = HTV[c.list_of_param[1]] - HTV[c.list_of_param[2]];

                        } break;
                    case 2:
                        {
                            //alors addition
                            HTV[c.channel] = HTV[c.list_of_param[1]] + HTV[c.list_of_param[2]];
                        } break;
                }


            }
        }

At now nb_comp_data is 36 and the code runs in 7353 micro second.

godefroi · April 4, 2012, 10:25am

I question MoonDragon’s numbers. In my testing of an empty loop, I got 31.8 microseconds per empty loop iteration (compared to his 54us), using my FEZ Panda II.

On my Cerb40, I get 7.4 microseconds. Interesting that the increase in MHz alone doesn’t explain the speed increase. Something else must have gotten faster in NETMF 4.2.

Also, for those that are curious, the numbers are extremely consistent (doesn’t change from run to run, which makes sense), and it doesn’t matter whether one uses DateTime.Now.Ticks or Microsoft.SPOT.Hardware.Utility.GetMachineTime().Ticks for measurement. The numbers come out the same.

Here’s the code I used to test:


// do some math
var iterations   = 10000;
var one_ms_span  = new TimeSpan(0, 0, 0, 0, 1);
var ticks_per_us = (double)one_ms_span.Ticks / 1000d;

var start = Microsoft.SPOT.Hardware.Utility.GetMachineTime();
for( var x = 0; x < iterations; x++ )
{
	// emtpy loop
}
var end = Microsoft.SPOT.Hardware.Utility.GetMachineTime();

// more math
var total_ticks    = end.Ticks - start.Ticks;
var ticks_per_iter = (double)total_ticks / (double)iterations; // get single iteration value in ticks
var us_per_iter    = ticks_per_iter / ticks_per_us; // get single iteration value in microseconds

Debug.Print(us_per_iter.ToString());

Gus_Issa · April 4, 2012, 10:37am

1Mhz ARM7 is slower than 1Mhz cortex-m4

ARM7 is 0.7
cortex-m4 is 1.2
cortex-a8 is 2.0
cortex-a15 is 2.5

Architect · April 4, 2012, 10:41am

[quote]ARM7 is 0.7
cortex-m4 is 1.2
cortex-a8 is 2.0
cortex-a15 is 2.5[/quote]

What are these numbers?

godefroi · April 4, 2012, 10:45am

And ARM9 (as used in Hydra) is 1.1?