Main Site Documentation

UTF8Encoding.UTF8.GetBytes BUG


#1

Today I encountered bug in GetBytes overload:


        public virtual int GetBytes(string s, int charIndex, int charCount, byte[] bytes, int byteIndex);

If you have a string with special characters it will trim one character for each special character. For example:


string value = "my string č my stringgg";
byte[] data = new byte[value.Length*2];
int count = System.Text.UTF8Encoding.UTF8.GetBytes(value, 0, value.Length, data, 0);

will trim last character (“g”) because there’s “č” in the string which causes two bytes in resulting byte array. Method incorrectly uses CharCount parameter - it should use it as number of chars in string not number of bytes in resulting byte array.


#2

I would report it on netmf site. So the team at Microsoft is aware and will eventually fix it.


#3

Ok.

I forgot to mention this is NetMF 4.2. Can anybody test in 4.3?


#4

Yep, same on 4.3

string value = "my string č my stringgg";
byte[] data = new byte[value.Length * 2];
int count = System.Text.UTF8Encoding.UTF8.GetBytes(value, 0, value.Length, data, 0);
string value1 = new string(System.Text.UTF8Encoding.UTF8.GetChars(data));

value1 is missing last character.


#5

Thanks for testing. I posted it on codeplex site.


#6

Was this reported to Microsoft on codeplex?


#7

Yes it was:
http://netmf.codeplex.com/workitem/2130


#8

Great thanks