Bad performance with HashTable

godefroi · April 4, 2012, 10:45am

DMIPS/MHz I believe.

Gus_Issa · April 4, 2012, 10:52am

They are somewhere on ARM website.

faybs · April 4, 2012, 4:47pm

@ leforban

I wonder if maybe the switch statement is the bottleneck. A naive implementation of that would create the equivalent of a series of if…else stataments, which would slow things down. What happens if you just remove the switch and use the 0 or 4 case all the time? just to measure the timing. If that does speed things up, see if sorting the cases inside the switch statement (so the more common cases appear earlier in the list of cases) helps.

Jay_Jay · April 5, 2012, 9:01am

If you are going to query a large amount of data it is always better and faster to use a Dictionary instead of a switch statement… especially if you are gong through a lot of cases…

Jay

leforban · April 12, 2012, 4:30pm

I don’t know if the number of element can be considered as large. In fact the larger table consist in a thousand of float. However I don’t see your point about the relation between the size of the data set and the switch statement. Iterate on a dictionnary can be simpler and faster if I need to iterate on a kind of data but it seems to me that it is not the case on the value of the data itself.

for exemple:

foreach(float f in arraylist_of_data) is good if I want to select only the float value.

But I want to iterate on the value (let say value = 4.3)

I will be oblige to use:

foreach (dictionnaryentry de in array_list_of_data)
if (de.value==4.3)
…

For now each time that I use an arraylist or a Hashtable I see very low perf whereas using a naive programming scheme result in better performance… It forces me to forgot my poor skills in design patterns

Jay_Jay · April 12, 2012, 7:43pm

well you will get a bit of better performance if you use the Contains()… instead of looping all the way to last item of the collection.


if myDictionay.Contain(4.3) do something..

leforban · May 11, 2012, 9:50am

In fact I iterate on all the element of the Hashtable and foreach element there’s a computation to do. The “Contains()” does not match well in this particular case.

For the “switch case” vs “nested if branch”, in fact I do not observe siginficant improvement in terms of runtime first conditions of “if” statement are smartly written (probability to enter or not in the branch).

The huge time spent is according to me due to boxing/unboxing and I still need to run more experiments to see exactly whats wrong in the data structures.

ianlee74 · May 11, 2012, 10:21am

You will find that a for loop performs better than a foreach loop.

leforban · May 14, 2012, 10:29am

Hello

I’ve tried to compare on a checksum computation of NMEA sentence.
doing

 foreach(char ch in nmeastring)

is about the same than


for(int i=0;i< nmeastring.Length;i++)
{
}

This test case is perhaps too obvious and optimized by the compiler… Do you have any other benchmark to confirm your assumption?

ianlee74 · May 14, 2012, 10:44am

Here’s a fairly good overview of the difference between for and foreach.

http://www.codeproject.com/Articles/6759/FOREACH-Vs-FOR-C

I’ve read cases where “foreach” provides better performance than “for” but in general most tests I’ve seen show that “for” is better because it requires less boxing for value types. Of course, always try both for your situation but this is something to add to your “bag of tricks”. Google “c# for vs foreach”. You’ll find plenty of discussions on the topic.

PACMAN · May 14, 2012, 10:48am

few month ago ianlee74 helped me out optimize pathfinding alogrithm. (cf http://www.tinyclr.com/forum/topic?id=5548&page=1)

Changing the foreach into for loop was already a huge improvment.

leforban · May 14, 2012, 11:14am

Ok I read it, this is really interesting. However, after looking into my code and changing the foreach into for loop, does not improve performance of my application. For now my application retrieve data from IMU, parse and deal with 2 NMEA inputs, check value on 6 buttons, read 5 analog inputs, do some trivial computation and check some conditions (i.e. values above a given treshold…). The entire code performs in 400ms while mainly the same application runs in les than 100ms on a AT90CAN128 with C programming (AVR studio).

There’s still room for improvements… but I don’t know where to start.

ianlee74 · May 14, 2012, 1:17pm

You’re biggest problem is going to be parsing the string data from the IMU. Are you getting a lot of GC in your debug window? Using char[] instead of strings can make a big difference. I can’t find it at the moment, but Wouter posted a good example here 4-5 months ago in another thread regarding parsing serial data.

leforban · May 14, 2012, 3:35pm

Hi Ian does Ian is your correct name?

The fact is that I started more than one month ago to optimize the application runtime and as soon as a new functionality is added I take a while to try different implementation and choose the best one in term of runtime. Actually, my IMU send data as a set of six float (three angles and three rate of turn). As far as I remember (I am at Home now) the IMU send data continuously, I free the buffer first and then wait for … I think 56 bytes (6*(4bytes per float)+2 (*1 byte for each delim character a comma separator)). As soon as the buffer is filled with a proper RS 232 stream, the EMX extract each float and store them into a HashTable of float that represent for us a Dataset whose the key is a ushort defined before starting the application.

For now runtime measurements of this method gives 11ms per iteration

ianlee74 · May 14, 2012, 5:19pm

Yes, Ian is my name.

I’ve been working on a similar type of problem with my quadcopter project on & off for the past year. 11ms doesn’t sound too bad. I’d still keep an eye on the GC output. If it’s often then there might be more you can do.

leforban · May 15, 2012, 4:01am

For now GC does not run (or at least does not print on Debug window) very often, may be every 5mn. But it creates a hole of 500ms in data sampling.

The first GC shows on debug window appears after 705 iterations of the loop code (I give later some info about this loop). The GC returns:

GC: 498msec 336264 bytes used, 12704784 bytes available
Type 0F (STRING              ):  39780 bytes
Type 11 (CLASS               ): 184272 bytes
Type 12 (VALUETYPE           ):  16980 bytes
Type 13 (SZARRAY             ):  61092 bytes
Type 15 (FREEBLOCK           ): 12704784 bytes
Type 16 (CACHEDBLOCK         ):    360 bytes
Type 17 (ASSEMBLY            ):  21636 bytes
Type 18 (WEAKCLASS           ):    144 bytes
Type 19 (REFLECTION          ):     24 bytes
Type 1B (DELEGATE_HEAD       ):    612 bytes
Type 1C (DELEGATELIST_HEAD   ):     96 bytes
Type 1D (OBJECT_TO_EVENT     ):   1032 bytes
Type 1E (BINARY_BLOB_HEAD    ):   2496 bytes
Type 1F (THREAD              ):   1152 bytes
Type 20 (SUBTHREAD           ):    144 bytes
Type 21 (STACK_FRAME         ):   1392 bytes
Type 27 (FINALIZER_HEAD      ):   1248 bytes
Type 31 (IO_PORT             ):   1548 bytes
Type 34 (APPDOMAIN_HEAD      ):     72 bytes
Type 36 (APPDOMAIN_ASSEMBLY  ):   2184 bytes

The second one appears after 1700 iterations (about 12 minutes):
GC: 625msec 336300 bytes used, 12704748 bytes available

Type 0F (STRING              ):  39828 bytes
Type 11 (CLASS               ): 184272 bytes
Type 12 (VALUETYPE           ):  17124 bytes
Type 13 (SZARRAY             ):  61092 bytes
Type 15 (FREEBLOCK           ): 12704748 bytes
Type 17 (ASSEMBLY            ):  21636 bytes
Type 18 (WEAKCLASS           ):    144 bytes
Type 19 (REFLECTION          ):     24 bytes
Type 1B (DELEGATE_HEAD       ):    612 bytes
Type 1C (DELEGATELIST_HEAD   ):     96 bytes
Type 1D (OBJECT_TO_EVENT     ):   1032 bytes
Type 1E (BINARY_BLOB_HEAD    ):   2496 bytes
Type 1F (THREAD              ):   1152 bytes
Type 20 (SUBTHREAD           ):    144 bytes
Type 21 (STACK_FRAME         ):   1596 bytes
Type 27 (FINALIZER_HEAD      ):   1248 bytes
Type 31 (IO_PORT             ):   1548 bytes
Type 34 (APPDOMAIN_HEAD      ):     72 bytes
Type 36 (APPDOMAIN_ASSEMBLY  ):   2184 byte

The application consists in reading inputs (2*NMEA, IMU, 5 Analog IN, 6 buttons, 5 Digital In) to fill a Hashtable of float called “DATASET”. Then using this hashtable, conditions defined by the user in a Hashtable of “Instructions” (about a hundred of instructions, each one composed of 2 to 5 conditions) allow to takes decision, such as switching a relay, turning on led, log events on SDCard…

here’s a summary of runtime of my code:
RTC in,2607, micro seconds (This update the DATASET elements that corespond to date and time entry)
BPC in,7943, micro seconds (This store the state of button in the Dataset)
BUZ in,69, micro seconds (This updates the internal buzzer accordinc to the dataset element corresponding to buzin)
BUZEX in,70, micro seconds (This updates the internal buzzer according to the dataset element corresponding to buzex)
LED in,70, micro seconds (This updates the 13 LED according to the dataset)
ANS in,24844, micro seconds (This updates the Dataset with Analog inputs value (the first analog input being averaged with 64 samples due to an important noise on this input)
DNS in,7071, micro seconds (This updates the Dataset with digital input value)
REL in,122, micro seconds (This update the relay according to the Dataset)
NMEA in,59573, micro seconds (This updates the dataset with 2*NMEA inputs running at 4800Bps)
IMU in,5901, micro seconds (This updates the dataset with value of angles and rate of turn as float)
powoo in,976, micro seconds (this manage the power on off class)
rellevel in,983, micro seconds (this manage the power rellevel class)
buzoo in,815, micro seconds (this manage the buzzer on off class)
Computed data in: 89689, micro seconds (This computes data from the dataset according to some rules of computation (addition, substraction… of dataset element) and store results in the dataset
Update conditions,127982, micro seconds (This computes the results of all conditions (conditions have been stored into a Hashtable)
Evaluate instruction in: 24109, micro seconds (This evaluates if all conditions of a given instructions equals to true or not, if true an action is taken, if one condition of the instruction is false the instruction is not performed)
Loop performs in397140 micro seconds

ianlee74 · May 22, 2012, 1:33pm

@ leforban - I took a few days off for vacation. It looks like you aren’t having GC issues (which is great!). Have you narrowed down a particular part of your code that you think is having the biggest performance impact? Perhaps if you can post that section we can make more suggestions.

leforban · May 23, 2012, 3:52am

Actually I think that the function that updates conditions should be the one on wich we can have the best impact (more than 100ms spent in this method). Let me time to draw a topo or a summary of the context of this method. I’ll be back…