how large can a buffer be? 1 MB bfr = crash?
I had to increase my buffer size in MFC application to 1 MB (compiile time, no dynamic allocation) and the application would not run afterwards. It simples crashes somewhere in MFC code.
I started creating a simplest plain application and was able to produce the problem. Attached is a basic MFC Dialog based application with just two buffers declared.
BYTE m_Huge1[1000 * 1024];
BYTE m_Huge2[1000 * 1024];
run this application and it will crash! Comment one of the buffer above and it will not crash.
What kind of limitations are there on buffer sizes like this? It does compiles fine but why does it crashes when run?
I want to make this work and there is a transfer button which simply copies one buffer into the other.
[773 byte] By [
zspirit] at [2007-11-20 9:36:43]

# 1 Re: how large can a buffer be? 1 MB bfr = crash?
The problem is simply that you've created this large block of memory on the stack.
There's no particular limit on memory allocations, save those inherent in the operating system and platform. For most win32 development, the limit is 2Gbytes, but a smaller practical limit will be reached before then, and is further limited by that particular machine's configuration.
The dialog object is being created 'on the stack' in the InitInstance function, in the fashion of a modal dialog for a dialog based application. All members of the dialog object are, as a result, also on the stack.
If you switch to dynamic allocation of this buffer you can bypass this problem. You can increase the stack size, but that's not really a decent solution to the problem.
I see no references for the use of this buffer in your example project, so I can't advise on alternate storage strategy other than to say if for some reason a vector or a string isn't appropriate, you should devise an allocation of the buffer in the constructor of the dialog and arrange for it's deletion in the destructor, which means the buffer can survive the 'doModal' call for you to access it within InitInstance 'on the way out'.
JVene at 2007-11-11 4:09:32 >

# 5 Re: how large can a buffer be? 1 MB bfr = crash?
The buffer recieve data from serial port, now data is growing and so the buffer needs to grow.And how are you to grow this buffer at runtime? The only way to do it is to dynamically allocate the data. The easiest thing to do is just declare a std::vector<char> and just resize it according to the data coming in.
I don't think it is a design issue because I was able to demonstrate in the posted example how easily one can get that problem..just by simply declaring 2 buffer of 1MB each which was suprising (that it will crash the application).Maybe there was more than 1 MB of data? Maybe you have some other issues in your code that caused the crash? We don't know.
The common perception is that stack variables/buffers are safe as compared to dyanically allocated (for memory leaks obviosly) and so far no one had realy pointed this aspect of it that static buffers have ultimately size limitations.There are runtime limitations, namely stack space. Can we say if we need buffer of say 100K (which may potentially be required to grow), we shall use dynamic allocation rather than simply compile time static buffer?This is a much better approach, and honestly, the first thing that should have been done. According to my understanding we declare a buffer for the largest expect data on the port which would be defined as char ibfr[100*1024];And what if this is not the largest data that you actually are receiving? How do you check for this in your program? If you're not checking, then the program is not designed correctly. You must always anticipate anything that will happen.
You should dynamically allocate for the largest expected (better yet, use a container class such as std::vector<char> that is easily expandable), and expand the area when more than what is expected comes in (and maybe log that a large item was received).
Using arrays is really the poor-man's approach in solving your problem -- in actual practice, the area should be dynamically allocated, given that the data can vary in size, and may on occasion be more than is expected.
Regards,
Paul McKenzie
# 6 Re: how large can a buffer be? 1 MB bfr = crash?
...I don't think it is a design issue because I was able to demonstrate in the posted example how easily one can get that problem..just by simply declaring 2 buffer of 1MB each which was suprising (that it will crash the application).
As you said and JVene, you have allocated memory on heap and that is different. Now given this situation, that certain looks better that allows you to use larger buffers. did you refer this to as the design flaw? The common perception is that stack variables/buffers are safe as compared to dyanically allocated (for memory leaks obviosly) and so far no one had realy pointed this aspect of it that static buffers have ultimately size limitations.
This obviously IS a design issue.
First, your function doesn't own the stack - it is used by all functions on the current execution path to keep their return addresses, parameters and local data. If you would really declare 1MB buffer, your app would crash with just ONE of them. (Your buffers are "only" 1,000 KB each).
Second (as Paul noted), if you are using almost all of the available resource, slight modification to your app can cause it to crash again.
However, for the sake of completeness, I would like to point that there are ways to increase the stack size for your app: /STACK linker option or STACKSIZE statement in a module-definition (.def) file.
Another way is to declare those buffers globally.
# 7 Re: how large can a buffer be? 1 MB bfr = crash?
Now that I have something to go on with respect to what these buffers are doing, I can offer something pertinent.
I propose that you separate the notions of buffering and storage. Buffering is typically a temporary storage are used to shuttle material from one 'stage' to another (though the definition of a buffer is rather vague).
In a generic design, the amount of incoming data could vary wildly. Some incoming data could be only a few kilobytes, while others might be gigabytes, and still others might be continuous streaming of data without predictable end. For the latter (in graduated shades), it may be that the only reasonable ultimate destination is bulk storage (disk, flash drive, etc).
Between these extremes (tiny, easily handled packets of information to gigabytes of storage) is a bulk of data that may require some finesse to arrange. Many people tend to think that data streamed in must be held in a contiguous block of RAM, but that's not always true. There are usually natural boundaries to groups of data within a stream, and sometimes it's not bothersome to incorporate arbitrary boundaries (or blocks).
The buffer used to receive data from the incoming port can be small enough to be easily managed, but represents a size that balances the recurring need to process 'buckets' of incoming data against the speed of the incoming data and the duty of repeated bucket handling. Imagine the difference between emptying a swimming pool with a drinking cup vs a 5 gallon bucket.
If the data has definite and obvious boundaries, you probably find separators in the data which can be found to manage that - a process called de-blocking. If the data isn't easily de-blocked, or the segments are quite large, the process will be more of an assemblage than of a separation, but both actions are usually part of the task.
The duty of de-blocking may be heavier than is reasonable for an 'in process' construction - that is, you may want to hand off the incoming blocks of data to a thread that performs this work (which can include writing the data to disk).
This hints at a general construction that I use in my work. I dynamically create a block of RAM suitable for the incoming bucket of data, a vague definition, I realize, such that perhaps this size is variable and adjustable to fit the circumstance. Such blocks can be quite small for a serial port operating at 56Kbaud, but if the same process applied to a USB port at several MBytes per second it should likely be much larger.
When that 'bucket' of data is full, a new one is allocated for the next bucket. This newly filled bucket is then appended to a container of buckets (perhaps a list, queue or similar container). This container is shared between the thread servicing the incoming data pump and the processing thread. Each thread only needs a short slice of time to manage the container, just enough for one thread to add an entry and the other thread to remove an entry, in order.
This container is a kind of buffer that dynamically grows as required, but does not require re-allocation like a vector, which would naturally take a lot of time copying. It 'looks' like a conveyor belt of buckets of data, winding their way between two components, the incoming pump and the processing component.
If the data can be naturally separated in the processing stage, so be it. If it must be assembled into a larger whole (say the buckets are 1Mbyte, but the final data will be a 250Mbyte file), the obviously the work is an assembly. Perhaps your data stream includes information about the final objective, say some header that, early on, described what was coming. This information would have been 'recognized' by the processing component by whatever standard you've devised, deciding how the data is to be assembled.
If the output is going to a file, then a file should be open and these buckets written in order. If the output is SQL, for example - a stream of XML data - your processing thread will be parsing a XML stream into data - naturally expecting a beginning and ending to the XML file. In this case it seems natural to assume that the assembly objective is to prepare, perhaps even in RAM, an intact XML file, which is then sent to an XML parser, etc.
There are many permutations on this theme, all based on the notion of this 'conveyor belt' of incoming buckets of data.
I'd suggest this implies a 'data bucket' class, which itself handles the dynamic allocation of raw buffers. The buffer inside the 'data bucket' class is used for the incoming reading from the port, and then, too, supplied to the 'other end' of the conveyor. It probably includes a value indicating the size of that 'bucket'.
While it's doubtful to be much of an issue with slower ports, it might be possible that the 'conveyor', which dynamically expands as required, could overflow. Some means of observing it's size, relative to the limits of the machine, should be incorporated so as the provide the incoming stream manager the ability to pause the incoming stream while the conveyor is emptied.
Conversely, the processing thread would need to be notified when the data stream has stopped receiving, and perhaps some means of communicating that the expected end of data has been reached by the processing thread would also control this coordinated behavior.
As to your interest in the 'largest buffer' used...
It's generally unwise to run toward the limits of the machine. In certain 32bit server versions of Windows it's possible to extend the OS ability to handle RAM beyond 4Gbytes, using PAE. This still does not extend the 32bit task's own limitation of 2Bytes per process. There is another switch, I think it's 3big, that tells the operating system to support more than 2Gbyte allocations within a process, but the compiler must also be configured to support that, and it's a bit of a hassle. Generally, the high bit of a pointer is reserved by the operating system (thus the 2Gbyte limit).
Unless you're using a garbage collected environment, where RAM can be relocated by the OS, the tendency is that large blocks of RAM cause obstacles to finding contiguous blocks for new allocations, and that must be managed.
I had a questioner here with problems using a vector that reached 2Gbytes. The notion that the data had to be in a contiguous block was the basic problem. I supplied a 'vector like' container that operated like a unified array (referencing data by numeric index), but was in fact an array of smaller arrays. This meant that the OS wasn't actually allocating (or reallocating to create) 1Gbyte or larger vectors, but was, instead, appending 50 Mbyte vectors to a list of vectors. The result was a much better organization of RAM, allowing larger studies of the objective without crashing do to memory exhaustion. It would also have been possible, though slower, to use memory mapped files to extend this vector beyond 2Gbytes, to the limit of disk storage.
JVene at 2007-11-11 4:15:43 >
