OutOfMemory reading logs from SD

Hello,
I’ve a “standard” problem that I cannot solve myself :frowning:

I’ve a cerbuino bee, where I log some data on SD, using plain text files.
I cannot read and parse these files because I always “burn” RAM, and get out of memory exception.

Basically my code:

1 - access folder where I have file using System.IO.Directory.EnumerateFiles. Gets first file I may upload and go on. Then this will be repeated until directory is empty.

2 - open a file. Currently I’m using filestream with a little buffer as

 Using fs As New System.IO.FileStream(file, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.None, 60)

3 - read lines one by one. Im not using the readLine but I tried to read using a memory leak woraround I was using on Panda and .NET 4.1. It is not rquired, but I’ve a lot of trouble so I tried this an other time.

4 - parse a single line and try to upload data via HTTP.

The memory leak is in step 1-2-3, because disabling step 4 I can see that available memory (Debug.GC…) decreases and I go out of memory.
Every heavy object (FileStream) is instanced with Using, so it’s disposed.

I cannot solve it :frowning:
Where is the right approach to read some txt files, parse them and upload? I think that this is a task so common that there will be an affordable pattern.
My files are small: about 1KB, and every row is really little, just 4 semicolon separated numbers.

thanks

p.s. line reading is implemented as this:


 public static string ReadLineUsingBuffer(System.IO.FileStream fs)
    {
        int newChar = 0;
        int bufLen = 256;// 512; // NOTE: the smaller buffer size.
        char[] readLineBuff = new char[bufLen];
        int growSize = 256; // 512;
        int curPos = 0;
        while ((newChar = fs.ReadByte()) != -1)
        {
            if (curPos == bufLen)
            {
                if ((bufLen + growSize) > 0xffff)
                {
                    throw new Exception();
                }
                char[] tempBuf = new char[bufLen + growSize];
                Array.Copy(readLineBuff, 0, tempBuf, 0, bufLen);
                readLineBuff = tempBuf;
                bufLen += growSize;
            }
            readLineBuff[curPos] = (char)newChar;
            if (readLineBuff[curPos] == '\n')
            {
                return new string(readLineBuff, 0, curPos);
            }
            if (readLineBuff[curPos] == '\r')
            {
                // what's this? :)
                //if (fs.Peek() == 10)
                //{
                //    fs.ReadByte();
                //}
                return new string(readLineBuff, 0, curPos);
            }
            curPos++;
        }

        if (curPos == 0) return null; // Null fix.
        return new string(readLineBuff, 0, curPos);
    }

I can tell you what the code section does which commented out with the comment ‘whats this’.

If you have \r\n line endings then this skips the \n
If it gets and \r and the next char (which it reads by Peek) is \n then it reads (consumes) the next char.
I can also tell you that this would throw an IndexOutOfRangeException if the \r falls on a multiple of your buffer size (so if \r is the 256 or 5212, … char in the file.
This is a bug in Peek.
But removing it if you have \r\n line endigs is also not a good idea.
Now it Ends reading the line on \r.
The next char is a \n which is then also recognices as a new line bei the next call to your method. So you get an empty line after each normal line.

And if you inspect the code closely, then you will see that it allocates larger buffers if the line is longer than your Initial buffer size.
If your lines are Long the buffer in increased and copied to the new one.

How many times does this method get called and how are the returned string referenced from the calling code?

if you added a Debug.GC(true) just after calling this method, does it still happen? That’s not very efficient, but it may help identify whether this particular routine is allocating memory that is not getting collected.

i notice there is also a DumpHeap method that might be useful: http://msdn.microsoft.com/en-us/library/ee433231.aspx. You could dump the heap with each call and see if the info outputted offers any clues as to which objects are surviving.

@ baz -
how about these?

const int bufLen = 256;// 512; // NOTE: the smaller buffer size.
static char[] readLineBuff = new char[bufLen];
const int growSize = 256; // 512;
static char[] tempBuf = new char[bufLen + growSize];
public static string ReadLineUsingBuffer(System.IO.FileStream fs)
    {
        int newChar = 0;
       // int bufLen = 256;// 512; // NOTE: the smaller buffer size.
        //char[] readLineBuff = new char[bufLen];
        //int growSize = 256; // 512;
        int curPos = 0;
        while ((newChar = fs.ReadByte()) != -1)
        {
          Debug.GC(true); 
           Thread.sleep(10);

            if (curPos == bufLen)
            {
                if ((bufLen + growSize) > 0xffff)
                {
                    throw new Exception();
                }
               // char[] tempBuf = new char[bufLen + growSize];
                Array.Copy(readLineBuff, 0, tempBuf, 0, bufLen);
                readLineBuff = tempBuf;
                bufLen += growSize;
            }
            readLineBuff[curPos] = (char)newChar;
            if (readLineBuff[curPos] == '\n')
            {
               
                return new string(readLineBuff, 0, curPos);
                
            }
            if (readLineBuff[curPos] == '\r')
            {
                // what's this? <img src="/img/forum/smilies/smiley.gif" width="15" height="15" title="Smiley" alt="Smiley" />
                //if (fs.Peek() == 10)
                //{
                //    fs.ReadByte();
                //}
                return new string(readLineBuff, 0, curPos);
            }
            curPos++;
        }

        if (curPos == 0) return null; // Null fix.
        return new string(readLineBuff, 0, curPos);
    }

Single file processing is this routine. I added Debug.GC before your reply, but the problem didn’t change.

  

Using fs As New System.IO.FileStream(file, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.None, 60)

                Dim currentLine As String

                '-- first row to discard
                currentLine = Extension.ReadLineUsingBuffer(fs)

                Debug.Print(Debug.GC(False).ToString)

                Do

                    currentLine = Extension.ReadLineUsingBuffer(fs)

                    If currentLine IsNot Nothing AndAlso currentLine.Length > 0 Then

                        Dim uploadPayload As New Contracts.Observation
                        Try

                            uploadPayload = LocalStorage.ObservationsCache.ParseFileRow(currentLine)

                        Catch ex As Exception
                        End Try

                        Try

                            ' -- uploads data via HTTP
                            fileCanBeDeleted = UploadObservation(uploadPayload)

                        Catch ex As Exception
                            fileCanBeDeleted = False

                            Continue Do

                        End Try

                    End If


                Loop Until fs.Position >= fs.Length              

                fs.Close()

            End Using


@ Dat -

 
bufLen += growSize;

bufLen cannot be a const

Yes, this is the behavior, thanks.
In fact this function was a workaround to avoid readLine from MF (that has a bug in 4.1). I don’t know, however, if this is the problem, and if this function is still useful.

Ok, so I’ve changed default buffer to 512 (my lines are less that 512).

I try if this change something…
thanks

seems 4.2 doesn’t include this

remove const ;D

You are right, but: using this code I will allocate and leave used memory for buffer. And if buffer grows, this memory will grow and will not be reduced later.
Is this a better pattern (fill a block of memory and use always that), instead of fill and release buffer’s memory during runtime?

thanks

define “better” ??

The positive thing is that there’s no “cost” in memory allocation/reallocation/free. You know your buffer stays around, and you know you won’t cause garbage collection which can be expensive.

The negative is that you always use that amount no matter how much you really need.

I am not sure I understand you. But in that code, there is no allocation - free memory many times, I think it is good if there really has problem with garbage collector.

And in there, I added Thread.sleep(10) after Debug.GC(true), this is quite important to provide a short time for garbage collector. Finally, adding Thread.sleep in a while loop is highly recommended.

For me, hard to say which way is better. But if a array is often used, I will use a “pattern”.
Or

void Demo()
{
    if (array == null)
    {
         array = new (...)
    }
   ..........
}

So on that way, if I need memory somewhere else, I can free the buffer, but in Demo function, surely there is no allocate - free memory too much.

Just a sharing about what I did on Cerberus where has limitation about memory.