Undertanding the char data type in .NET MF

Based on a recent question I did some digging into the .NETMF CLR implementation, currently I do not have the tools and OSHW to actually test this on, so I have just been reading the code. I am 99% certain that the char data type is treated as a 2 byte char. I base this on two things, firstly the c_CLR_RT_DataTypeLookup in TypeSystemLookup.cpp which defines the basic meta-data for each of core types specifies char as being 16 bits (2 bytes).


////  m_flags, m_sizeInBits, m_sizeInBytes, m_promoteTo, m_convertToElementType, m_cls, m_relocate, m_name
{ DT_NUM | DT_INT | DT_PRIM | DT_DIR | DT_OPT | DT_MT,   16, DT_U2, DT_T(I4), DT_CNV(CHAR), DT_CLS(m_Char), DT_NOREL(CLR_RT_HeapBlock) DT_OPT_NAME(CHAR) }, // DATATYPE_CHAR

Note the m_sizeInBits and m_sizeInBytes being 16 and DT_U2 respectively. Where DT_U2 is sizeof(CLR_UINT16) ie. 2.

Then I looked at the array allocation which would allocate the raw memory to hold a char, which ultimately calls ExtractHeapBlocksForArray in Excution.cpp which calculates the amount of memory to allocate for the array based on the ‘m_sizeInBytes’ member in the c_CLR_RT_DataTypeLookup instance for the char data type.

Note: Code comments are mine.


    // Look up the data for the data type of the array
    CLR_DataType                 dt  = (CLR_DataType)inst.m_target->dataType;                     
    const CLR_RT_DataTypeLookup& dtl = c_CLR_RT_DataTypeLookup[ dt ];

    // Calculate the total memory required to manage the array. Note that the length of the array is multiplied by dtl.m_sizeInBytes  which is 2 for char.
    CLR_UINT32 totLength = (CLR_UINT32)(sizeof(CLR_RT_HeapBlock_Array) + length * dtl.m_sizeInBytes); 

So at this point I feel quite confident that even though .NET MF manages strings internally as UTF-8, char are 2 byte Unicode characters. With that in hand I thought I would try find some documentation to back my findings. Using BING I found the ‘Beginners Guide to C# and the .NET Micro Framework’ (I did not even think to look here :))
http://www.ghielectronics.com/downloads/FEZ/Beginners%20guide%20to%20NETMF.pdf

However, in section 10.2 I found the following statement

Of course this contradicted my research, I went back and double checked and I still conclude that a char is in fact 2 bytes.

Have I missed something or is the guide incorrect?

Why is this important to me,
[ul]Well it is not really, other than I want to confirm my understanding.[/ul]
[ul]For those that are working on boards like the Cerberus which is far more memory constrained than the Spider or Hydra, this could be important information since under the hood you are using double the amount of memory than the statement in the guide would lead you to believe.[/ul]
[ul]If anyone does build .NETMF applications that will target non-English languages, I have seen a fair amount of code recently on this forum that blindly assumes a char is byte, which will fail for none English characters.[/ul]

1 Like

Nice finding! Can you find the same in the NETMF 4.1 porting kit? (I would look it up, but I only have my phone here)

@ WouterH - I just checked 4.1 and there does not seem to be any change in this area.

4.1 - TypeSystemLookup.cpp


{ DT_NUM | DT_INT | DT_PRIM | DT_DIR | DT_OPT | DT_MT,16, DT_U2, DT_T(I4), DT_CNV(CHAR), DT_CLS(m_Char), DT_NOREL(CLR_RT_HeapBlock) DT_OPT_NAME(CHAR) }, // DATATYPE_CHAR

4.1 - ExtractHeapBlocksForArray is doing the same calculation


CLR_UINT32 totLength = (CLR_UINT32)(sizeof(CLR_RT_HeapBlock_Array) + length * dtl.m_sizeInBytes);

In my two projects:

http://www.tinyclr.com/codeshare/entry/302
http://www.tinyclr.com/codeshare/entry/301

the code assumes that a char is 2 bytes and that char[] arrays are 2 bytes * the length of the array.

I never checked the IL, but my code works as expected, so I believe that @ taylorza is correct.

@ jasdev - Thank you for the confirmation.

Does anyone know the process to submit a correction to the ‘Beginners Guide to C# and the .NET Micro Framework’ document?

As a starting point, I would suggest something like the following: