Byte array parsing utility

andrejk · November 11, 2010, 11:33pm

I’m doing a lot of work processing data streams for GPSs IMUs and other sensor devices. I’m having to resort to pretty inefficent code to get much of this done by turning the data into strings and then using the .net parse routines to get the data. I’m using a GPS that’s capable of sending updates 10hz but even after extensive tuning I can’t get my parsing fast enough to do that, much less leave time for important things like running PID filters to control the plane.

GHIElectronics.NETMF.System.ExtractValueFromArray is great, but I’d like a more extensive library and the .NET Framework has a way to extract a string from a byte array (UTF8 decoder). But I’m getting killed by the GC by creating hundreds (thousands?) of immutable strings per second just to turn those strings into Ints and Doubles. Right now it’s:

UTF8Decoder -> String.Split -> Double.Parse

I would love a beefed-up utility class that can work at the byte level and extract UTF-encoded floats and doubles given a byte array, offset and length.

I could write this in RLP but that’s no available in Panda so I’m asking here.

Also, the Array.IndexOf method is SLOW on byte[] arrays. I looked at the MF source for it and I see ways to write a byte-array-specific version that would be much faster.

And a Pony… I also want a Pony!

Architect · November 12, 2010, 12:46am

Can you please show us some code? It is easier to brainstorm with something to look at.

Blue_Hair_Bob · November 12, 2010, 1:27am

Hey, yeah, groovy. Fast Parsers are kind of a specialty with me. I write custom language compilers at work and what you are doing is really just translating one input stream into an equivalent stream in a different language. Post your code and lets see what you are doing. Also post the type of GPS you are using and maybe we can build a really fast class for that chip.

Gus_Issa · November 12, 2010, 8:39am

I have always needed such thing but not sure how would that be implemented. You do not have RLP but you have a the whole GHI development team available! That is better than RLP.
So, if you guys can come up with generic class that would help in accomplishing your work, I will surly bug the hell out of our developers till they add it

You guys come up with the class and I will take care of the rest. Again, it has to be generic so all users can benefit from it, not targeted to your needs only. Basically, you will have the interface in C# explaining how it will be useful and explain what will be implemented natively in C++, we will take care of the implementation.

How is that for support? lol

Jeff_Birt · November 12, 2010, 9:06am

Since string operations create new strings you can wind up with a memory and GC mess. I would be more inclined to just copy out the bytes that represent values and then use bit operators (masking, shifting, etc as needed) to extract the values.

andrejk · November 12, 2010, 9:09am

@ gus – I love the way you talk!

Maybe we should prototype a managed version of it, take it through a few cycles of feedback and hand it to the GHI guys to make it native (fast). That way we can go through a few iterations before any hard work is done. I’d be happy to set up a private codeplex project if anyone is interested in participating.

I figured someone would ask me to ‘show the code’ It’s scattered across several classes to gimme a little time to pull it into a test method. My GPS parsing is based on the mfgps library (http://mfgps.codeplex.com/) but I have made many attempts at optimization. It’s faster and more flexible now (not clean enough to commit back to the project), but it’s still not fast enough to keep up w 10hz.

One optimizaton I made resulted in my SerialBuffer class (http://www.fezzer.com/project/130/serialbuffer-simple-fast-way-to-read-serial-data/). I think a lot of what this class does could be in scope of this kind of work. The model, to parallel the desktop framework somewhat, could be a StreamReader of some sort. Right now the SerialPort class in the MF doesn’t support readers though so there would probably have to be more than one way to do it.

Amen, brother–I’m living that right now. I dont have a good way to measure but I think the GC is killing me more than the actual CPU cycles to run my code.

Gus_Issa · November 12, 2010, 10:25am

Yeah, plus advanced users have consulting hours that they can use towards this work

andrejk · November 13, 2010, 7:19pm

What use cases should this be designd to cover? I have a NMEA GPS data stream, but pelase suggest other test cases for something like this.

andrejk · November 18, 2010, 7:26pm

My codeplex project is created (and I worked through processes) for the prototypes and tests for this. http://netmfbinary.codeplex.com/

It isn’t published yet but just request access and I’ll let you in. (mail if you need it: andrej@ kyselica.com)