Re: Buffer or Realloc?



bwaichu@xxxxxxxxx wrote:
Is it generally better to set-up a buffer (fixed sized array) and read
and write to that
buffer even if it is larger than what is being written to it? Or is it
better to allocate memory and realloc it for the size of the what is
being written each time? In other words, what is the decision factor
between deciding to use a fixed size buffer or allocating memory
space and reallocing the size?

Fixed size arrays are good for things that whose contents *CANNOT*
exceed the size of that array. A string usually is never that kind of
thing.

So usually the decision is quite simple: if you are dealing with
mutable strings *NEVER* use a fixed size array (this is a definitive
mark of an amateur and is usually accompanied by either buffer overflow
errors or ruthless truncation), but instead use the reallocing buffer
strategy. There is a common tendency in C programmers to conflate char
arrays and strings -- the two are different things conceptually and
this has implications for what you are actually doing.

If you just need a block buffer or something like that, or you are
transforming your algorithm to run through chunks of a stream at a
time, then obviously you can just use a fixed sized buffer.

I don't think the code below is optimal since the size of the "buffer"
becomes extremely small when I realloc it. I also spend time
re-allocating
it. And taking this to the next step, if I wrote to a linked list or a
red black
tree, I would have to allocate memory. In that case, how do I
determine
the best size?

Note: The code below contains some posix items. Please ignore those
items. I have also left out some error coding below as well.
The
below was written to just learn the regex API.

int
main(void) {

regex_t preg;
char *string = strdup("hello1hello4hello2hello3 are you hello");
char *match;
regmatch_t pmatch[1];
size_t nmatch = 1;
size_t len = 0;
size_t offset = 0;
int test;

match = calloc(BUFSIZ, sizeof(char));

if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
break;
}
len = pmatch[0].rm_eo-pmatch[0].rm_so;
/* is this optimal or should I use a fixed sized buffer? */
match = realloc(match, len+1);

This is wrong is what it is. realloc may return with NULL. That means
that the line above will leak the original contents of match, any time
realloc happens to fail. This call is also unnecessary if it decreases
the size of the allocation in match.

strlcpy(match, &string[offset],
len+1);

Obviously this will fail, any time match is NULL. Also you should use
memcpy(), since its generally faster.

(void)printf("matched string: %s\n", match);
offset += pmatch[0].rm_eo;

}
(void)printf("original string: %s\n", string);
free(match);
regfree(&preg);
exit(0);
}

malloc, calloc and realloc are fairly slow function calls. So one
thing you should generally do is try to minimize the number of times
you call such functions. When reusing a buffer like you are doing
above, you only want to realloc more space if you need it.

There is a simple numeric size strategy for keeping the number of
reallocations down to a minimum while wasting a constant factor in
space (less than 100% extra) that you can use that is usually an
optimal trade off between speed and space: always realloc to the least
power of 2 greater than the size that you need,and never decrease the
size of a buffer with realloc. On a 32 bit machine this will means
that you never call realloc on the same buffer more than 32 times, and
you will never use more than an extra 100% over largest size that you
would have ever needed.

So I would recode your snippet something like as follows:

size_t mlen = 0;

match = NULL;

if ( regcomp(&preg, "hello[41]", REG_BASIC) != 0)
perror("regcomp failed");

while (string[offset] != '\0') {
test = regexec((const regex_t *)&preg, &string[offset],
nmatch, pmatch,0);
if (test == REG_NOMATCH || test !=0) {
break;
}
len = pmatch[0].rm_eo-pmatch[0].rm_so;
if (len > mlen) {
char *tmpmatch;
while (mlen <= len) mlen = (mlen>0) ? 2*mlen : 1;
tmpmatch = realloc (match, mlen);
if (NULL == tmpmatch) break; /* Out of memory? */
match = tmpmatch;
}
memcpy (match, &string[offset], len);
match[len] = '\0';
[etc ...]

Once you get used to this idea, you will find that it gets kind of
repetitve and tiresome to implement this same idea over and over again.
You will probably want to throw the concept into a library somewhere
and wonder why someone hasn't done this before you.

Well someone has: http://bstring.sf.net/

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

.



Relevant Pages

  • Re: Buffer or Realloc?
    ... better to allocate memory and realloc it for the size of the what is ... between deciding to use a fixed size buffer or allocating memory ... becomes extremely small when I realloc it. ... I would have to allocate memory. ...
    (comp.lang.c)
  • Re: Buffer or Realloc?
    ... better to allocate memory and realloc it for the size of the what is ... between deciding to use a fixed size buffer or allocating memory ... so for the string I've got to prepare as part of a message to the UK Government gateway where the specification says the string has a maximum length of 10 characters I should not use a fixed size buffer but a reallocating buffer? ... with the realloc() approach -- obviously ...
    (comp.lang.c)
  • Re: Buffer growing strategy?
    ... I'm looking into a buffer ... A very simple solution is to just call realloc for the new size ... The problem is not with the cost of an individual realloc ... If the allocator is a buddy system allocator there only be log n ...
    (comp.lang.c)
  • Re: Bug analysis
    ... char *ReadTextFile ... while (fgets(buffer, sizeof buffer, fp) { ... and in my view it /is/ a bug. ... Well, no, it is better to change the realloc. ...
    (comp.lang.c)
  • Re: Bug analysis
    ... char *ReadTextFile ... while (fgets(buffer, sizeof buffer, fp) { ... and in my view it /is/ a bug. ... Well, no, it is better to change the realloc. ...
    (comp.lang.c)