Why does Encoding.UTF8.GetBytes(s) cause an Out of Memory Exception

rgelb · February 17, 2011, 2:06am

Here is the code:


string s = "..."; // 5k long string
byte[] b = Encoding.UTF8.GetBytes(s);

According to the GC, I have plenty of RAM:
GC: 3msec 39636 bytes used, 24744 bytes available
Type 0F (STRING ): 2172 bytes
Type 11 (CLASS ): 4692 bytes
Type 12 (VALUETYPE ): 48 bytes
Type 13 (SZARRAY ): 1956 bytes
Type 15 (FREEBLOCK ): 24744 bytes
Type 17 (ASSEMBLY ): 18540 bytes
Type 18 (WEAKCLASS ): 48 bytes
Type 19 (REFLECTION ): 192 bytes
Type 1B (DELEGATE_HEAD ): 684 bytes
Type 1D (OBJECT_TO_EVENT ): 336 bytes
Type 1E (BINARY_BLOB_HEAD ): 4320 bytes
Type 1F (THREAD ): 2688 bytes
Type 20 (SUBTHREAD ): 192 bytes
Type 21 (STACK_FRAME ): 1500 bytes
Type 22 (TIMER_HEAD ): 72 bytes
Type 27 (FINALIZER_HEAD ): 336 bytes
Type 31 (IO_PORT ): 252 bytes
Type 34 (APPDOMAIN_HEAD ): 72 bytes
Type 36 (APPDOMAIN_ASSEMBLY ): 1536 bytes

I get an out of memory exception. Why? I seem to have plenty of RAM? Is there an alternate way to convert a string into a byte array?

Thanks.

P.S. This is the FezMini with the Robot Kit.

Gus_Issa · February 17, 2011, 8:13am

5K string? That is along string for mini/panda

Try this, before calling GetBytes, call
Debug.GC(true);

User_12 · February 17, 2011, 2:23pm

Another idea is to divide the big string into chunks instead of passing the whole string to GetBytes()

Mike · February 17, 2011, 2:43pm

I think we have insufficent information to answer the question.

When was the GC statistics produced? Right before the GetBytes() or earlier. To be meaningful, a Debug.GC(true) should be issued just prior to the GetBytes(). This will tell what memory is actually available.

Is variable a string really being set from a literal, which is stored in flash, or was it build during program execution? If it was build, a 5k string would take 10K of memory.

Are these two lines the only code in your program, or are you doing other things?

As Gus said, the first thing to do is but a Debug.GC(true) before the GetBytes().

William · February 17, 2011, 3:33pm

Around 24K or less avail, things start getting funky (as I have also found) so it is not a lot of ram in terms of gc. allocs, not only need ram, but need ram in contiguous space. So if it is fragmented, you can still get OME. As others say, Debug.GC(true) before may help. If not, run it 2-3 times before call just to see. BIG.NET GC is generational, so may not free everything first call. Not sure how netmf gc works. I also noticed a logged potential bug on the GC at codeplex. So maybe it is same issue. Maybe just not being aggressive enouph. It is a balance. Aggressive and perf run counter to each other in GC world. On big.net you can afford the ram and afford to wait for free time. On netmf, the needs are different. So maybe some more tuning is required.

rgelb · February 18, 2011, 12:38am

Here is what I am trying to do and then I’ll address individual points. I am trying to build up a MemoryStream by adding various string variables to it over time. In order to add a string to a MemoryStream, I have to convert it to a byte array (thus Encoding.UTF8.GetBytes method), then .Write the byte array to the memory stream.

To that end, I have the following code:


private MemoryStream _content = new MemoryStream();

public void WriteLine(string line)
{
    Write(line);
    Write("\r\n");
}

public void Write(string s)
{
    Debug.GC(true);
    byte[] b = Encoding.UTF8.GetBytes(s);
    Write(b);
}

public void Write(byte[] bytes)
{
    Debug.GC(true);
    _content.Write(bytes, 0, bytes.Length);
}

So WriteLine(s) is called where [italic]s[/italic] is a string with 4939 bytes.

@ Gus. I added Debug.GC(true) before the statement and sure enough the problem on the [italic]Encoding.UTF8.GetBytes(s)[/italic] line went away. However, the error occured literally right after when I called [italic]Write(“\r\n”)[/italic], and it occured on the [italic]_content.Write(bytes, 0, bytes.Length)[/italic] line. Here is the Debug dump right before the OOM dump.

GC: 2msec 39972 bytes used, 24408 bytes available
Type 0F (STRING ): 1728 bytes
Type 11 (CLASS ): 4464 bytes
Type 12 (VALUETYPE ): 48 bytes
Type 13 (SZARRAY ): 6396 bytes
Type 15 (FREEBLOCK ): 24408 bytes
Type 17 (ASSEMBLY ): 18552 bytes
Type 18 (WEAKCLASS ): 48 bytes
Type 19 (REFLECTION ): 192 bytes
Type 1B (DELEGATE_HEAD ): 684 bytes
Type 1D (OBJECT_TO_EVENT ): 336 bytes
Type 1E (BINARY_BLOB_HEAD ): 888 bytes
Type 1F (THREAD ): 2304 bytes
Type 20 (SUBTHREAD ): 144 bytes
Type 21 (STACK_FRAME ): 1920 bytes
Type 22 (TIMER_HEAD ): 72 bytes
Type 27 (FINALIZER_HEAD ): 336 bytes
Type 31 (IO_PORT ): 252 bytes
Type 34 (APPDOMAIN_HEAD ): 72 bytes
Type 36 (APPDOMAIN_ASSEMBLY ): 1536 bytes
Failed allocation for 825 blocks, 9900 bytes

And the error stack shows that the problem occurs in MemoryStream.EnsureCapacity:

[quote]
#### Exception System.OutOfMemoryException - CLR_E_OUT_OF_MEMORY (6) ####
#### Message:
#### System.IO.MemoryStream::EnsureCapacity [IP: 0040] ####
#### System.IO.MemoryStream::Write [IP: 0043] ####
#### Robot.HttpResponse::Write [IP: 0012] ####
#### Robot.HttpResponse::Write [IP: 0013] ####
#### Robot.HttpResponse::WriteLine [IP: 000e] ####
#### Robot.RobotHttpHandler::ProcessRequest [IP: 00a3] ####
#### Robot.HttpProcessor::ProcessRequest [IP: 0095] ####
#### Robot.HttpServer::AggregateInputData [IP: 00a1] ####
A first chance exception of type ‘System.OutOfMemoryException’ occurred in System.IO.dll[/quote]

@ Joe. Yes, I am trying to split up processing of the string into smaller chunks, but the bottom line is that if I try to do anything on a string that 5k with around 20k of RAM remaining, it’ll OOM. I’ve tried to do string.Substring and it bombs as well. So since I am trying to put out a web page, I’ll try and split it up into separate .css, .js, .htm and I’ll probably throw in a couple of iframes for a good measure. That should reduce the request.

@ Mike The string comes from a resource (so Flash I guess), via [italic]Resources.GetString(Resources.StringResources.Robot)[/italic] method. So would it be 10k in this case? And yes, I am doing other things as you saw in the beginning of the post.

rgelb · February 18, 2011, 1:22am

Update. Per suggestion, I split up my string into 2 parts around 2.5k each. That did not help as the MemoryStream was still throwing OOM errors.

I, then, replaced MemoryStream with a string variable (and changed the underlying code as well) and that seemed to fix the problem.

One last thing that caused me grief is Debug.Print statement. If you pass it a string of 2.5k, it also gives an OOM in my situation.

I think we need to examine the internal workings of Encoding.UTF8.GetBytes and MemoryStream.EnsureCapacity because they seem to be eating up more memory than they should.

Geir_Andersen · February 18, 2011, 2:30am

I might be very wrong here but wont your 5k string allocate memory several times?
As I see it the string in WriteLine occupies 5k to hold the line string. This is passed on to the Write function which allocates 5k to hold the ‘s copy of the string. Then it allocates 5k worth of byte[] for the ‘b copy. Is this byte array then sent to a new function ? that would probably add new 5k and the _content.Write might even have its own copy of the 5k.
So could the problem be that the 5k gets allocated to many times?

Gus_Issa · February 18, 2011, 2:59am

Think small, think embedded. Never seen a 5k long string on micro before
You can have a 10k buffer but then keep it always alive

rgelb · February 18, 2011, 3:05am

@ Gus, the Mini has a built-in web server, thus I have to a serve a page to the browser - that’s where the 5k comes from.

William · February 18, 2011, 4:40am

I think in general, building up a whole string before you send it is not workable over time in netmf (in some cases, not even on a PC). Have not looked at the web apis yet. Is there WebClient.OpenWrite() method to get a stream writer? Then write smaller chunks to the network? Then no upper size. Also only way to effectively handle multiple users and share resources. If not there, they need it.

Mike · February 18, 2011, 7:27am

when you write to stream it keeps increasing the size of it’s internal buffer. while the internal buffer size is being increased, you have two copies. this can result in fragmented memory.

I do not have access to the documentation now. I would see if you can set the capacity of the memory stream to a size big enough to hold the string. this will avoid buffer resizing.

Gus_Issa · February 18, 2011, 7:32am

Fragmentation is hte problem…

// create a relatively very long string (that 5K out of 60K)
string s = “MY 5k strin…”;
//add 2 more bytes
s += “hi”;

What happens when you add those 2 bytes is that the core needs to allocate another 5K+2 bytes then move the data then remove the old strings. At some point, you used 5K+5K+2 bytes.

… now tin this few times and the heap is very much fragmented. You do have a lot of ram left but they are scattered everywhere.
You can for GC to compact the heap which will solve the problem but then things will run slower.

Mike · February 18, 2011, 7:37am

Gus

Unicode makes it worse. 5k +5k + 2 is really 10k + 10k +4.

William · February 18, 2011, 3:07pm

“when you write to stream it keeps increasing the size of it’s internal buffer. while the internal buffer size is being increased, you have two copies. this can result in fragmented memory.”

I was talking about network stream. A network stream would not expand an array, just write the bytes out. And if you can help it, just read and send bytes. Don’t even convert to string if you don’t have to as encoding/decoding just adds overhead.

Are we still talking about just reading an *.htm file and sending it? Or you need to munge the text before sending?

rgelb · February 19, 2011, 4:03pm

@ William. I am just reading an HTM file from Resources, adding HTTP headers to it and shooting it out WiFly card.

But anyway, I split up the file into .HTM and .JS and now it seems to work fine, since those are separate requests.

William · February 19, 2011, 4:48pm

Send out the header first to network, then read the file in chunks in a loop and send the chunks. Even if working now, it maybe just barely working during tests. But if you do small chunks and create a little static helper method, you will not have worry about file size in the future.